**Dirk Beyer Marieke Huisman (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**24th International Conference, TACAS 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018, Proceedings, Part II**

# Lecture Notes in Computer Science 10806

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

### Editorial Board

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA Gerhard Weikum, Germany

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, City University of Hong Kong Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Tools and Algorithms for the Construction and Analysis of Systems

24th International Conference, TACAS 2018 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018 Thessaloniki, Greece, April 14–20, 2018 Proceedings, Part II

Editors Dirk Beyer Ludwig-Maximilians-Universität München Munich Germany

Marieke Huisman University of Twente Enschede The Netherlands

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-89962-6 ISBN 978-3-319-89963-3 (eBook) https://doi.org/10.1007/978-3-319-89963-3

Library of Congress Control Number: 2018940138

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## ETAPS Foreword

Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am happy to announce that this is the first ETAPS with gold open access proceedings. This means that all papers are accessible by anyone for free.

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee. The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security. Organizing these conferences in a coherent, highly synchronized conference program facilitates participation in an exciting event, offering attendees the possibility to meet many researchers working in different directions in the field, and to easily attend talks of different conferences. Before and after the main conference, numerous satellite workshops take place and attract many researchers from all over the globe.

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, yielding an overall acceptance rate of 30%. I thank all the authors for their interest in ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and (ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on hardware verification. My sincere thanks to all these speakers for their inspiring and interesting talks!

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the Department of Informatics of the Aristotle University of Thessaloniki. The university was founded in 1925 and currently has around 75000 students; it is the largest university in Greece. ETAPS 2018 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros Stratis (EasyConferences).

The overall planning for ETAPS is the main responsibility of the Steering Committee, and in particular of its Executive Board. The ETAPS Steering Committee consists of an Executive Board and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman (Twente), Panagiotis Katsaros (Thessaloniki), Ralf Küsters (Stuttgart), Ugo Dal Lago (Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), Don Sannella (Edinburgh), Andy Schürr (Darmstadt), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendees, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local organization team for all their enormous efforts that led to a fantastic ETAPS in Thessaloniki!

February 2018 Joost-Pieter Katoen

## Preface

TACAS 2018 is the 24th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems conference series. TACAS 2018 is part of the 21st European Joint Conferences on Theory and Practice of Software (ETAPS 2018). The conference is held in the hotel Makedonia Palace in Thessaloniki, Greece, during April 16–19, 2018.

Conference Description. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, flexibility, and efficiency of tools and algorithms for building systems. TACAS solicits five types of submissions:


New Items in the Call for Papers. There were three new items in the call for papers, which we briefly discuss.


– Artifact Evaluation. For the first time, TACAS 2018 included an optional artifact evaluation (AE) process for accepted papers. An artifact is any additional material (software, data sets, machine-checkable proofs, etc.) that substantiates the claims made in a paper and ideally makes them fully replicable. The evaluation and archival of artifacts improves replicability and traceability for the benefit of future research and the broader TACAS community.

Paper Selection. This year, 154 papers were submitted to TACAS, among which 115 were research papers, 6 case-study papers, 26 regular tool papers, and 7 were tool-demonstration papers. After a rigorous review process, with each paper reviewed by at least 3 program committee (PC) members, followed by an online discussion, the PC accepted 35 research papers, 2 case-study papers, 6 regular tool papers, and 2 tool-demonstration papers (45 papers in total).

Competition on Software Verification (SV-COMP). TACAS 2018 also hosted the 7th International Competition on Software Verification (SV-COMP), chaired and organized by Tomas Vojnar. The competition again had a high participation: 21 verification systems with developers from 11 countries were submitted for the systematic comparative evaluation, including two submissions from industry. This volume includes short papers describing 9 of the participating verification systems. These papers were reviewed by a separate program committee (PC); each of the papers was assessed by four reviewers. One session in the TACAS program was reserved for the presentation of the results: the summary by the SV-COMP chair and the participating tools by the developer teams.

Artifact-Evaluation Process. The authors of each of the 45 accepted papers were invited to submit an artifact immediately after the acceptance notification. An artifact evaluation committee (AEC), chaired by Arnd Hartmanns and Philipp Wendler, reviewed these artifacts, with 2 reviewers assigned to each artifact. The AEC received 33 artifact submissions, of which 24 were successfully evaluated (73% acceptance rate) and have been awarded the TACAS AEC badge, which is added to the title page of the respective paper. The AEC used a two-phase reviewing process: Reviewers first performed an initial check of whether the artifact was technically usable and whether the accompanying instructions were consistent, followed by a full evaluation of the artifact. In addition to the textual reviews, reviews also provided scores for consistency, completeness, and documentation. The main criterion for artifact acceptance was consistency with the paper, with completeness and documentation being handled in a more lenient manner as long as the artifact was useful overall. Finally, TACAS provided authors of all submitted artifacts the possibility to publish and permanently archive a "camera-ready" version of their artifact on https://springernature.figshare. com/tacas, with the only requirement being an open license assigned to the artifact. This possibility was used for 20 artifacts, while 2 more artifacts were archived independently by the authors.

Acknowledgments. We would like to thank all the people who helped to make TACAS 2018 successful. First, the chairs would like to thank the authors for submitting their papers to TACAS 2018. The reviewers did a great job in reviewing papers: They contributed informed and detailed reports and took part in the discussions during the virtual PC meeting. We also thank the steering committee for their advice. Special thanks go to the general chair, Panagiotis Katsaros, and his overall organization team, to the chair of the ETAPS 2018 executive board, Joost-Pieter Katoen, who took care of the overall organization of ETAPS, to the EasyConference team for the local organization, and to the publication team at Springer for solving all the extra problems that our introduction of the new artifact-evaluation process caused.

March 2018 Dirk Beyer Marieke Huisman (PC Chairs) Goran Frehse (Tools Chair) Tomas Vojnar (SV-COMP Chair) Arnd Hartmanns Philipp Wendler (AEC Chairs)

### Organization

#### Program Committee

Alessandro Cimatti FBK-irst, Italy Rupak Majumdar MPI-SWS, Germany Tiziana Margaria Lero, Ireland Alexander K. Petrenko ISP RAS, Russia

Wolfgang Ahrendt Chalmers University of Technology, Sweden Dirk Beyer (Chair) Ludwig-Maximilians-Universität München, Germany Armin Biere Johannes Kepler University Linz, Austria Lubos Brim Masaryk University, Czech Republic Franck Cassez Macquarie University, Australia Rance Cleaveland University of Maryland, USA Goran Frehse University of Grenoble Alpes – Verimag, France Jan Friso Groote Eindhoven University of Technology, The Netherlands Gudmund Grov Norwegian Defence Research Establishment (FFI), Norway Orna Grumberg Technion — Israel Institute of Technology, Israel Arie Gurfinkel University of Waterloo, Canada Klaus Havelund Jet Propulsion Laboratory, USA Matthias Heizmann University of Freiburg, Germany Holger Hermanns Saarland University, Germany Falk Howar TU Clausthal/IPSSE, Germany Marieke Huisman (Chair) University of Twente, The Netherlands Laura Kovacs Vienna University of Technology, Austria Jan Kretinsky Technical University of Munich, Germany Salvatore La Torre Università degli studi di Salerno, Italy Kim Larsen Aalborg University, Denmark Axel Legay IRISA/Inria, Rennes, France Yang Liu Nanyang Technological University, Singapore Rosemary Monahan National University of Ireland Maynooth, Ireland David Parker University of Birmingham, UK Corina Pasareanu CMU/NASA Ames Research Center, USA Zvonimir Rakamaric University of Utah, USA Kristin Yvonne Rozier Iowa State University, USA Natasha Sharygina USI Lugano, Switzerland Stephen F. Siegel University of Delaware, USA Bernhard Steffen University of Dortmund, Germany Stavros Tripakis University of California, Berkeley, USA Frits Vaandrager Radboud University, The Netherlands Tomas Vojnar Brno University of Technology, Czech Republic


### Program Committee and Jury — SV-COMP

Tomáš Vojnar (Chair) Peter Schrammel (representing 2LS) Jera Hensel (representing AProVE) Michael Tautschnig (representing CBMC) Vadim Mutilin (representing CPA-BAM-BnB) Mikhail Mandrykin (representing CPA-BAM-Slicing) Thomas Lemberger (representing CPA-Seq) Hussama Ismail (representing DepthK) Felipe Monteiro (representing ESBMC-incr) Mikhail R. Gadelha (representing ESBMC-kind) Martin Hruska (representing Forester) Zhao Duan (representing InterpChecker) Herbert Oliveira Rocha (representing Map2Check) Veronika Šoková (representing PredatorHP) Franck Cassez (representing Skink) Marek Chalupa (representing Symbiotic) Matthias Heizmann (representing UAutomizer) Alexander Nutz (representing UKojak) Daniel Dietsch (representing UTaipan) Priyanka Darke (representing VeriAbs) Pritom Rajkhowa (representing VIAP) Liangze Yin (representing Yogar-CBMC)

### Artifact Evaluation Committee (AEC)

Arnd Hartmanns (Chair) Philipp Wendler (Chair) Pranav Ashok Maryam Dabaghchian Daniel Dietsch Rohit Dureja Felix Freiberger Karlheinz Friedberger Frederik Gossen Samuel Huang Antonio Iannopollo Omar Inverso Nils Jansen Sebastiaan Joosten

Eunsuk Kang Sean Kauffman Ondrej Lengal Tobias Meggendorfer Malte Mues Chris Novakovic David Sanan

#### Additional Reviewers

Aarssen, Rodin Alzuhaibi, Omar Andrianov, Pavel Asadi, Sepideh Ashok, Pranav Bacci, Giovanni Bainczyk, Alexaner Baranowski, Marek Barringer, Howard Ben Said, Najah Benerecetti, Massimo Benes, Nikola Bensalem, Saddek Berzish, Murphy Biewer, Sebastian Biondi, Fabrizio Blahoudek, František Blicha, Martin Bosselmann, Steve Bruttomesso, Roberto Butkova, Yuliya Casagrande, Alberto Caulfield, Benjamin Ceska, Milan Chen, Wei Chimento, Jesus Mauricio Cleophas, Loek Cordeiro, Lucas Dabaghchian, Maryam Darulova, Eva de Vink, Erik Delzanno, Giorgio Dietsch, Daniel Du, Xiaoning

Dureja, Rohit Dvir, Nurit Ehlers, Rüdiger Elrakaiby, Yehia Enea, Constantin Faella, Marco Falcone, Ylies Fedotov, Alexander Fedyukovich, Grigory Fox, Gereon Freiberger, Felix Frenkel, Hadar Frohme, Markus Genaim, Samir Getman, Alexander Given-Wilson, Thomas Gleiss, Bernhard Golden, Bat-Chen González De Aledo, Pablo Goodloe, Alwyn Gopinath, Divya Gossen, Frederik Graf-Brill, Alexander Greitschus, Marius Griggio, Alberto Guthmann, Ofer Habermehl, Peter Han, Tingting Hao, Jianye Hark, Marcel Hartmanns, Arnd Hashemi, Vahid He, Shaobo Heule, Marijn

Hoenicke, Jochen Holik, Lukas Horne, Ross Hou, Zhe Hou Hyvärinen, Antti Inverso, Omar Irfan, Ahmed Jabbour, Fadi Jacobs, Swen Jansen, Nils Jensen, Peter Gjøl Joshi, Rajeev Jovanović, Dejan Kan, Shuanglong Kang, Eunsuk Kauffman, Sean Klauck, Michaela Kopetzki, Dawid Kotelnikov, Evgenii Krishna, Siddharth Krämer, Julia Kumar, Rahul König, Jürgen Lahav, Ori Le Coent, Adrien Lengal, Ondrej Leofante, Francesco Li, Jianwen Lime, Didier Lin, Yuhui Lorber, Florian Maarek, Manuel Mandrykin, Mikhail Marescotti, Matteo

Markey, Nicolas Meggendorfer, Tobias Meyer, Philipp Meyer, Roland Micheli, Andrea Mjeda, Anila Moerman, Joshua Mogavero, Fabio Monniaux, David Mordan, Vitaly Murtovi, Alnis Mutilin, Vadim Myreen, Magnus O. Navas, Jorge A. Neele, Thomas Nickovic, Dejan Nies, Gilles Nikolov, Nikola S. Norman, Gethin Nyman, Ulrik Oortwijn, Wytse Pastva, Samuel Pauck, Felix Pavlinovic, Zvonimir Pearce, David Peled, Doron

Poulsen, Danny Bøgsted Power, James Putot, Sylvie Quilbeuf, Jean Rasin, Dan Reger, Giles Reynolds, Andrew Ritirc, Daniela Robillard, Simon Rogalewicz, Adam Roveri, Marco Ročkai, Petr Rüthing, Oliver Šafránek, David Salamon, Andras Z. Sayed-Ahmed, Amr Schieweck, Alexander Schilling, Christian Schmaltz, Julien Seidl, Martina Sessa, Mirko Shafiei, Nastaran Sharma, Arnab Sickert, Salomon Simon, Axel Sloth, Christoffer

Spoto, Fausto Sproston, Jeremy Stan, Daniel Taankvist, Jakob Haahr Tacchella, Armando Tetali, Sai Deep Toews, Manuel Tonetta, Stefano Traonouez, Louis-Marie Travkin, Oleg Trostanetski, Anna van den Bos, Petra van Dijk, Tom van Harmelen, Arnaud Vasilev, Anton Vasilyev, Anton Veanes, Margus Vizel, Yakir Widder, Josef Wijs, Anton Willemse, Tim Wirkner, Dominik Yang, Fei Zakharov, Ilja Zantema, Hans

# Contents – Part II

#### Concurrent and Distributed Systems


#### SAT and SMT II


#### Security and Reactive Systems



#### Temporal Logic and Mu-calculus



# Contents – Part I

#### Theorem Proving


#### Software Verification and Optimisation


# Concurrent and Distributed Systems

# **Computing the Concurrency Threshold of Sound Free-Choice Workflow Nets**

Philipp J. Meyer1(B) , Javier Esparza<sup>1</sup> , and Hagen V¨olzer<sup>2</sup>

<sup>1</sup> Technical University of Munich, Munich, Germany {meyerphi,esparza}@in.tum.de <sup>2</sup> IBM Research, Zurich, Switzerland hvo@zurich.ibm.com

**Abstract.** Workflow graphs extend classical flow charts with concurrent fork and join nodes. They constitute the core of business processing languages such as BPMN or UML Activity Diagrams. The activities of a workflow graph are executed by humans or machines, generically called resources. If concurrent activities cannot be executed in parallel by lack of resources, the time needed to execute the workflow increases. We study the problem of computing the minimal number of resources necessary to fully exploit the concurrency of a given workflow, and execute it as fast as possible (i.e., as fast as with unlimited resources).

We model this problem using free-choice Petri nets, which are known to be equivalent to workflow graphs. We analyze the computational complexity of two versions of the problem: computing the resource and concurrency thresholds. We use the results to design an algorithm to approximate the concurrency threshold, and evaluate it on a benchmark suite of 642 industrial examples. We show that it performs very well in practice: It always provides the exact value, and never takes more than 30 ms for any workflow, even for those with a huge number of reachable markings.

### **1 Introduction**

A *workflow graph* is a classical control-flow graph (or flow chart) extended with concurrent fork and join. Workflow graphs represent the core of workflow languages such as BPMN (Business Process Model and Notation), EPC (Eventdriven Process Chain), or UML Activity Diagrams.

In many applications, the activities of an execution workflow graph have to be carried out by a fixed number of *resources* (for example, a fixed number of computer cores). Increasing the number of cores can reduce the minimal runtime of the workflow. For example, consider a simple deterministic workflow (a workflow without choice or merge nodes), which forks into k parallel activities, all of duration 1, and terminates after a join. With an optimal assignment of resources to activities, the workflow takes time k when executed with one resource, time k/2 with two resources, and time 1 with k resources; additional resources bring no further reduction. We call k the *resource threshold*. In a deterministic workflow that forks into two parallel chains of k sequential activities each, one resource leads to runtime 2k, and two resources to runtime k. More resources do not improve the runtime, and so the resource threshold is 2. Clearly, the resource threshold of a deterministic workflow with k activities is a number between 1 and k. Determining this number can be seen as a scheduling problem. However, most scheduling problems assume a fixed number of resources and study how to optimize the makespan [11,17], while we study how to minimize the number of resources. Other works on resource/machine minimization [5,6] consider interval constraints instead of the partial-order constraints given by a workflow graph.

**Fig. 1.** A sound free-choice workflow net and one of its runs (Color figure online)

Following previous work, we do not directly work with workflow graphs, but with their equivalent representation as *free-choice workflow Petri nets*, which has been shown to be essentially the same model [10] and allows us to directly use a wealth of results of free-choice Petri nets [7]. Figure 1(a) shows a free-choice workflow net. The actual workflow activities, also called *tasks*, which need a resource to execute and which consume time are modeled as the places of the net: Each place p of the net is assigned a time τ (p), depicted in blue. Intuitively, when a token arrives in p, it must execute a task that takes τ (p) time units before it can be used to fire a transition. A free choice exists between transitions t<sup>4</sup> and t6, which is a representation of a choice node (if-then-else or loop condition) in the workflow.

If no choice is present or all choices are resolved, we have a deterministic workflow such as the one in Fig. 1(b). In Petri net terminology, deterministic workflows correspond to the class of marked graphs. Deterministic workflows are common in practice: in the standard suite of 642 industrial workflows that we use for experiments, 63.7% are deterministic. We show that already for this restricted class, deciding if the threshold exceeds a given bound is NP-hard. Therefore, we investigate an over-approximation of the resource threshold, already introduced in [4]: the *concurrency threshold*. This is the maximal number of task places that can be simultaneously marked at a reachable marking. Clearly, if a workflow with concurrency threshold k is executed with k resources, then we can always start the task of a place immediately after a token arrives, and this schedule already achieves the fastest runtime achievable with unlimited resources. We show that the concurrency threshold can be computed in polynomial time for deterministic workflows.

For workflows with nondeterministic choice, corresponding to free-choice nets, we show that computing the concurrency threshold of free-choice workflow nets is NP-hard, solving a problem left open in [4]. We even prove that the problem remains NP-hard for sound free-choice workflows. Soundness is the dominant behavioral correctness notion for workflows, which rules out basic control-flow errors such as deadlocks. NP-hardness in the sound case is remarkable, because many analysis problems that have high complexity in the unsound case can be solved in polynomial time in the sound case (see e.g. [1,7,8]).

After our complexity analysis, we design an algorithm to compute bounds on the concurrency threshold using a combination of linear optimization and state-space exploration. We evaluate it on a benchmark suite of 642 sound freechoice workflow nets from an industrial source (IBM) [9]. The bounds can be computed in a total of 7 s (over all 642 nets). On the contrary, the computation of the exact value by state-space exploration techniques times out for the three largest nets, and takes 7 min for the rest. (Observe that partial-order reduction techniques cannot be used, because one may then miss the interleaving realizing the concurrency threshold.)

The paper is structured as follows. Section 2 contains preliminaries. Sections 3 and 4 study the resource and concurrency thresholds, respectively. Section 5 presents our algorithms for computing the concurrency bound, and experimental results. Finally, Sect. 6 contains conclusions.

#### **2 Preliminaries**

**Petri Nets.** A *Petri net* N is a tuple (P, T, F) where P is a finite set of places, T is a finite set of transitions (P ∩ T = ∅), and F ⊆ (P × T) ∪ (T × P) is a set of arcs. The *preset* of <sup>x</sup> <sup>∈</sup> <sup>P</sup> <sup>∪</sup> <sup>T</sup> is •<sup>x</sup> def = {y | (y, x) ∈ F} and its *postset* is x• def = {y | (x, y) ∈ F}. We extend the definition of presets and postsets to sets of places and transitions <sup>X</sup> <sup>⊆</sup> <sup>P</sup> <sup>∪</sup> <sup>T</sup> by •<sup>X</sup> def = - x∈X •<sup>x</sup> and <sup>X</sup>• def = - x∈X <sup>x</sup>•. A net is *acyclic* if the relation F<sup>∗</sup> is a partial order, denoted by and called the *causal order*. A node x of an acyclic net is *causally maximal* if no node y satisfies x ≺ y.

<sup>A</sup> *marking* of a Petri net is a function <sup>M</sup> : <sup>P</sup> <sup>→</sup> <sup>N</sup>, representing the number of tokens in each place. For a set of places <sup>S</sup> <sup>⊆</sup> <sup>P</sup>, we define <sup>M</sup>(S) def = p∈S <sup>M</sup>(p). Further, for a set of places <sup>S</sup> <sup>⊆</sup> <sup>P</sup>, we define by <sup>M</sup>S the marking with <sup>M</sup>S(p)=1 for <sup>p</sup> <sup>∈</sup> <sup>S</sup> and <sup>M</sup>S(p) = 0 for p /<sup>∈</sup> <sup>S</sup>.

A transition t is *enabled* at a marking M if for all p ∈ •t, we have M(p) ≥ 1. If t is enabled at M, it may *occur*, leading to a marking M obtained by removing one token from each place of •t and then adding one token to each place of t •. We denote this by M <sup>t</sup> −→ M . Let <sup>σ</sup> <sup>=</sup> <sup>t</sup>1t<sup>2</sup> ...tn be a sequence of transitions. For a marking M0, σ is an *occurrence sequence* if M<sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>M</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ... <sup>t</sup>*<sup>n</sup>* −→ <sup>M</sup>n for some markings <sup>M</sup>1,...,Mn. We say that <sup>M</sup>n is reachable from <sup>M</sup><sup>0</sup> by <sup>σ</sup> and denote this by M<sup>0</sup> σ −→ <sup>M</sup>n. The set of all markings reachable from <sup>M</sup> in <sup>N</sup> by some occurrence sequence σ is denoted by R<sup>N</sup> (M). A *system* is a pair (N,M) of a Petri net N and a marking M. A system (N,M) is *live* if for every M ∈ R<sup>N</sup> (M) and every transition t some marking M ∈ R<sup>N</sup> (M ) enables t. The system is *1-safe* if M (p) ≤ 1 for every M ∈ R<sup>N</sup> (M) and every place p ∈ P.

**Convention:** Throughout this paper we assume that systems are 1-safe, i.e., we identify "system" and "1-safe system".

**Net Classes.** A net N = (P, T, F) is a *marked graph* if | •p| ≤ 1 and |p•| ≤ 1 for every place p ∈ P, and a *free-choice net* if for any two places p1, p<sup>2</sup> ∈ P either p• <sup>1</sup> ∩ p• <sup>2</sup> = ∅ or p• <sup>1</sup> = p• 2.

**Non-sequential Processes of Petri Nets.** An (A, B)*-labeled Petri net* is a tuple N = (P, T, F, λ, μ), where λ: P → A and μ: T → B are *labeling functions* over alphabets A, B. The nonsequential processes of a 1-safe system (N,M) are acyclic, (P, T)-labeled marked graphs. Say that a set P of places of a (P, T) labeled acyclic net *enables* t ∈ T if all the places of P are causally maximal, carry pairwise distinct labels, and λ(P) = •t.

**Definition 1.** *Let* N = (P, T, F) *be a Petri net and let* M *be a marking of* N*. The set* N P(N,M) *of* nonsequential processes *of* (N,M) *(* processes *for short) is the set of* (P, T)*-labeled Petri nets defined inductively as follows:*

	- <sup>P</sup> <sup>=</sup> { <sup>p</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>t</sup> •}*, with* <sup>λ</sup>( <sup>p</sup>) = <sup>p</sup>*, and* <sup>μ</sup>( t) = <sup>t</sup>*;*
	- <sup>F</sup> <sup>=</sup> {( <sup>p</sup>, t) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup>}∪{( t, <sup>p</sup>) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup>}*;*

*also belongs to* N P(N,M)*. We say that* <sup>Π</sup>t extends <sup>Π</sup>*.*

*We denote the minimal and maximal places of a process* Π *w.r.t. the causal order by* min(Π) *and* max(Π)*, respectively.*

As usual, we say that two processes are *isomorphic* if they are the same up to renaming of the places and transitions (notice that we rename only the names of the places and transitions, not their labels).

Figure 2 shows two processes of the workflow net in Fig. 1(a). (The figure does not show the names of places and transitions, only their labels.) The net containing the white and grey nodes only is already a process, and the grey places are causally maximal places that enable t6. Therefore, according to the definition we can extend the process with the green nodes to produce another process. On the right we extend the same process in a different way, with the transition t4.

**Fig. 2.** Nonsequential processes of the net of Fig. 1(a) (Color figure online)

The following is well known. Let (P , T , F , λ, μ) be a process of (N,M):

– For every linearization σ = t <sup>1</sup> ...t n of <sup>T</sup> respecting the causal order , the sequence μ(σ) = μ(t <sup>1</sup>)...μ(t n) is a firing sequence of (N,M). Further, all these firing sequences lead to the same marking. We call it the *final marking* of Π, and say that Π leads from M to its final marking.

For example, in Fig. 2 the sequences of the right process labeled by t1t2t3t<sup>4</sup> and t1t3t2t<sup>4</sup> are firing sequences leading to the marking M = {p2, p5, p7}.

– For every firing sequence <sup>t</sup><sup>1</sup> ···tn of (N,M) there is a process (P , T , F , λ, μ) such that T = {t 1,...,t n}, <sup>μ</sup>(<sup>t</sup> i) = <sup>t</sup><sup>i</sup> for every 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and <sup>μ</sup>(<sup>t</sup> i) <sup>μ</sup>(<sup>t</sup> j ) implies i ≤ j.

**Workflow Nets.** We slightly generalize the definition of workflow net as presented in e.g. [1] by allowing multiple initial and final places. A *workflow* net is a Petri net with two distinguished sets I and O of *input places* and *output places* such that (a) •I = ∅ = O• and (b) for all x ∈ P ∪ T, there exists a path from some <sup>i</sup> <sup>∈</sup> <sup>I</sup> to some <sup>o</sup> <sup>∈</sup> <sup>O</sup> passing through <sup>x</sup>. The markings <sup>M</sup>I and <sup>M</sup>O are called initial and final markings of N. A workflow net N is *sound* if


It is well-known that every sound free-choice workflow net is a 1-safe system with the initial marking <sup>M</sup>I [2,7]. Given a workflow net according to this definition one can construct another one with one single input place i and output place o and two transitions <sup>t</sup>i, to with •ti <sup>=</sup> {i}, t• i <sup>=</sup> <sup>I</sup> and •t<sup>o</sup> <sup>=</sup> O, t• o <sup>=</sup> {o}. For all purposes of this paper these two workflow nets are equivalent.

Given a workflow net <sup>N</sup>, we say that a process <sup>Π</sup> of (N,MI ) is a *run* if it leads to <sup>M</sup>O. For example, the net in Fig. 1(b) is a run of the net in Fig. 1(a).

**Petri Nets with Task Durations.** We consider Petri nets in which, intuitively, when a token arrives in a place p it has to execute a task taking τ (p) time units before the token can be used to fire any transition. Formally, we consider tuples <sup>N</sup> = (P, T, F, τ ) where (P, T, F) is a net and <sup>τ</sup> : <sup>P</sup> <sup>→</sup> <sup>N</sup>.

**Definition 2.** *Given a nonsequential process* Π = (P , T , F , λ, μ) *of* (N,M)*, a time bound* t*, and a number of resources* k*, we say that* Π is executable within time <sup>t</sup> with <sup>k</sup> resources *if there is a function* <sup>f</sup> : <sup>P</sup> <sup>→</sup> <sup>N</sup> *such that*


*We call a function* f *satisfying (1) a* schedule*, a function satisfying (1) and (2) a* t -schedule*, and a function satsifying (1)–(3) a* (k, t)-schedule *of* Π*.*

Intuitively, f(p ) describes the starting time of the task executed at p . Condition (1) states that if p <sup>1</sup> p <sup>2</sup>, then the task associated to p <sup>2</sup> can only start after the task for p <sup>1</sup> has ended; condition (2) states that all tasks are done by time t, and condition (3) that at any moment in time at most k tasks are being executed. As an example, the process in Fig. 1(b) can be executed with two resources in time 6 with the schedule i, p1, p<sup>2</sup> → 0; p3, p<sup>4</sup> → 1; p7, p<sup>6</sup> → 3, and p8, p<sup>9</sup> → 4.

Given a process Π = (P , T , F , λ, μ) of (N,M) we define the schedule fmin as follows: if p ∈ min(Π) then fmin(p ) = 0, otherwise define fmin(p ) = max{fmin(p) + τ (λ(p)) | p p }. Further, we define the *minimal execution time* tmin(Π) = max{f(p )+τ (λ(p)) | p ∈ max(Π)}. In the process in Fig. 1(b), the schedule fmin is the function that assigns i, p1, p2, p<sup>7</sup> → 0, p3, p<sup>4</sup> → 1, p6, p<sup>8</sup> → 3, p<sup>9</sup> → 4, and o → 6, and so tmin(Π) = 6. We have:

**Lemma 1.** *A process* Π = (P , T , F , λ, μ) *of* (N,M) *can be executed within time* tmin(Π) *with* |P | *resources, and cannot be executed faster with any number of resources.*

*Proof.* For k ≥ |P | resources condition (3) of Definition 2 holds vacuously. Π is executable within time t iff conditions (1) and (2) hold. Since fmin satisfies (1) and (2) for t = tmin(Π), Π can be executed within time tmin(Π). Further, tmin(Π) is the smallest time for which (1) and (2) can hold, and so Π cannot be executed faster with any number of resources.

### **3 Resource Threshold**

We define the resource threshold of a run of a workflow net, and of the net itself. Intuitively, the resource threshold of a run is the minimal number of resources that allows one to execute it as fast as with unlimited resources, and the resource threshold of a workflow net is the minimal number of resources that allows one to execute *every run* as fast as with unlimited resources.

**Definition 3.** *Let* N *be a workflow net, and let* Π *be a run of* N*. The* resource threshold of Π*, denoted by* RT(Π) *is the smallest number* k *such that* Π *can be executed in time* tmin(Π) *with* k *resources. A schedule of* Π realizes *the resource threshold if it is a* (RT(Π), tmin(Π))*-schedule.*

*The* resource threshold *of* N*, denoted by* RT(N)*, is defined by* RT(N) = max{RT(Π) <sup>|</sup> <sup>Π</sup> *is a run of* (N,MI )}*. A* schedule of<sup>N</sup> *is a function that assigns to every process* Π ∈ NP(N,M) *a schedule of* Π*. A schedule of* N *is a* (k, t) *schedule if it assigns to every run* Π *a* (k, t)*-schedule of* Π*. A schedule of* N realizes *the resource threshold if it assigns to every run* Π *a* (RT(N), tmin(Π)) *schedule.*

*Example 1.* We have seen in the previous section that for the process in Fig. 1(b) we have tmin(Π) = 6, and a schedule with two resources already achieves this time. So the resource bound is 2. The workflow net of Fig. 1 has infinitely many runs, in which loosely speaking, the net executes t<sup>4</sup> arbitrarily many times, until it "exits the loop" by choosing t6, followed by t<sup>7</sup> and t8. It can be shown that all processes have resource threshold 2, and so that is also the resource threshold of the net.

In the rest of the section we obtain two negative results about the result threshold. First, it is difficult to compute: Determining if the resource threshold exceeds a given threshold is NP-complete even for acyclic marked graphs, a very simple class of workflows. Second, we show that even for acyclic free-choice workflow nets the resource threshold may not be realized by any online scheduler.

#### **3.1 Resource Threshold Is NP-complete for Acyclic Marked Graphs**

We prove that deciding if the resource threshold exceeds a given bound is NPcomplete even for acyclic sound marked graphs. The proof proceeds by reduction from the following classical scheduling problem, proved NP-complete in [18]:

**Given**: a finite, partially ordered set of jobs with non-negative integer durations, and non-negative integers t and k.

**Decide**: Can all jobs can be executed with k machines within t time units in a way that respects the given partial order, i.e., a job is started only after all its predecessors have been finished?

More formally, the problem is defined as follows: Given jobs <sup>J</sup> <sup>=</sup> {J1,...,Jn}, where <sup>J</sup>i has duration <sup>τ</sup> (Ji) for every 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, and a partial order on <sup>J</sup> , does there exist a function <sup>f</sup> : J → <sup>N</sup> such that


These conditions are almost identical to the ones we used to define if a nonsequential process can be executed within time t with k resources. We exploit this to construct an acyclic workflow marked graph that "simulates" the scheduling problem. For the detailed proof, we refer to the full version of this paper [15].

**Theorem 1.** *The following problem is NP-complete:*

**Given:** *An acyclic, sound workflow marked graph* N*, and a number* k*.* **Decide:** *Does* RT(N) ≤ k *hold?*

#### **3.2 Acyclic Free-Choice Workflow Nets May Have no Optimal Online Schedulers**

A resource threshold of k guarantees that every run *can* be executed without penalty with k resources. In other words, *there exists* a schedule that achieves optimal runtime. However, in many applications the schedule must be determined at runtime, that is, the resources must be allocated without knowing how choices will be resolved in the future. In order to formalize this idea we define the notion of an *online schedule* of a workflow net N.

**Definition 4.** *Let* N *be a Petri net, and let* Π *and* Π *be two processes of* (N,M)*. We say that* Π *is a* prefix *of* Π *, denoted by* Π - Π *, if there is a sequence* <sup>Π</sup>1,...,Πn *of processes such that* <sup>Π</sup><sup>1</sup> <sup>=</sup> <sup>Π</sup>*,* <sup>Π</sup>n <sup>=</sup> <sup>Π</sup> *, and* <sup>Π</sup>i+1 *extends* <sup>Π</sup>i *by one transition for every* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> <sup>−</sup> <sup>1</sup>*.*

*Let* f *be a schedule of* (N,M)*, i.e., a function assigning a schedule to each process. We say that* f *is an* online schedule *if for every two runs* Π1, Π2*, and for every two prefixes* Π <sup>1</sup> - Π<sup>1</sup> *and* Π <sup>2</sup> - Π2*: If* Π <sup>1</sup> *and* Π <sup>2</sup> *are isomorphic, then* f(Π <sup>1</sup>) = f(Π 2)*.*

Intuitively, if Π <sup>1</sup> and Π <sup>2</sup> are isomorphic then they are the same process Π, which in the future can be extended to either Π<sup>1</sup> or Π2, depending on which transitions occur. In an online schedule, Π is scheduled in the same way, independently of whether it will become Π<sup>1</sup> or Π<sup>2</sup> in the future. We show that even for acyclic free-choice workflow nets there may be no online schedule that realizes the resource threshold. That is, even though for every run it is possible to schedule the tasks with RT(N) resources to achieve optimal runtime, this requires knowing how it will evolve before the execution of the workflow.

**Proposition 1.** *There is an acyclic, sound free-choice workflow net for which no online schedule realizes the resource threshold.*

**Fig. 3.** A workflow net with two runs. No online scheduler for three resources achieves the minimal runtime in both runs. (Color figure online)

*Proof.* Consider the sound free-choice workflow net (N,MI ) of Fig. 3. It has two runs: <sup>Π</sup>g, which executes the grey and green transitions, and <sup>Π</sup>r, which executes the grey and red transitions. Their resource thresholds are RT(Πg) = RT(Πr) = 3, realized by the schedules <sup>f</sup>g and <sup>f</sup>r in Fig. 4:

**Fig. 4.** Schedules <sup>f</sup>*g* and <sup>f</sup>*r* for the two runs <sup>Π</sup>*g* and <sup>Π</sup>*r* of the net of Fig. 3.

Indeed, observe that <sup>f</sup>g and <sup>f</sup>r execute <sup>Π</sup>g and <sup>Π</sup>r within time 5, and even with unlimited resources no schedule can be faster because of the task p4, while two or fewer resources are insufficient to execute either run within time 5.

The schedule of (N,MI ) that assigns <sup>f</sup>g and <sup>f</sup>r to <sup>Π</sup>g and <sup>Π</sup>r is not an online schedule. Indeed, the process containing one single transition labeled by t<sup>1</sup> and places labeled by i, p1, p2, p<sup>3</sup> is isomorphic to prefixes of <sup>Π</sup>g and <sup>Π</sup>r. However, we have <sup>f</sup>g(p3)=0 =1= <sup>f</sup>r(p3). We now claim:

(a) Every schedule <sup>f</sup>g of <sup>Π</sup>g that realizes the resource threshold (time 5 with 3 resources) satisfies <sup>f</sup>g(p3) = 0.

Indeed, if <sup>f</sup>g(p3) <sup>≥</sup> 1, then <sup>f</sup>g(p5) <sup>≥</sup> 3, <sup>f</sup>g(p9) <sup>≥</sup> 6, and finally <sup>f</sup>g(o) <sup>≥</sup> 6, so <sup>f</sup>g does not meet the time bound.

(b) Every schedule <sup>f</sup>r of <sup>Π</sup>r that realizes the resource threshold (time 5 with 3 resources) satisfies <sup>f</sup>r(p3) <sup>&</sup>gt; 0.

Observe first that we necessarily have <sup>f</sup>r(p4) = 0, and so a resource, say R1, is bound to p<sup>4</sup> during the complete execution of the workflow, leaving two resources left. Assume <sup>f</sup>r(p3) = 0, i.e., a second resource, say <sup>R</sup>2, is bound to p<sup>3</sup> at time 0, leaving one resource left, say R3. Since both p<sup>1</sup> and p<sup>2</sup> must be executed before p8, and only R<sup>3</sup> is free until time 2, we get <sup>f</sup>r(p8) <sup>≥</sup> 2. So at time 2 we still have to execute <sup>p</sup>6, p7, p<sup>8</sup> with resources R2, R3. Therefore, two out of p6, p7, p<sup>8</sup> must be executed sequentially by the same resource. Since p6, p7, p<sup>8</sup> take 2 time units each, one of the two resources needs time 4, and we get <sup>f</sup>r(o) <sup>≥</sup> 6.

By this claim, at time 0, an online schedule has to decide whether to allocate a resource to p<sup>3</sup> or not, without knowing which of t<sup>3</sup> or t<sup>4</sup> will be executed in the future. If it schedules <sup>f</sup>(p3) = 0 and later <sup>t</sup><sup>4</sup> occurs, then <sup>Π</sup>r is executed and the deadline of 5 time units is not met. The same occurs if it schedules f(p3) > 0, and later t<sup>3</sup> occurs.

### **4 Concurrency Threshold**

Due to the two negative results presented in the previous section, we study a different parameter, introduced in [4], called the concurrency threshold. During execution of a business process, information on the resolution of future choices is often not available, and further no information on the possible duration of a task (or only weak bounds) are known. Therefore, the scheduling is performed in practice by assigning a resource to a task at the moment some resource becomes available. The question is: What is the minimal number of resources needed to guarantee the optimal execution time achievable with an unlimited number of resources?

The answer is simple: since there is no information about the duration of tasks, every reachable marking of the workflow net without durations may be also reached for some assignment of durations. Let M be a reachable marking with a maximal number of tokens, say k, in places with positive duration, and let <sup>d</sup><sup>1</sup> <sup>≤</sup> <sup>d</sup><sup>2</sup> ≤ ··· ≤ <sup>d</sup>k be the durations of their associated tasks. If less than <sup>k</sup> resources are available, and we do not assign a resource to the task with duration <sup>d</sup>k, we introduce a delay with respect to the case of an unlimited number of resources. On the contrary, if the number of available resources is k, then the scheduler for k resources can always simulate the behaviour of the scheduler for an unlimited number of resources.

**Definition 5.** *Let* N = (P, T, F, I, O, τ ) *be a workflow Petri net. For every marking* M *of* N*, define the* concurrency *of* M *as* conc(M) *def* = p∈D <sup>M</sup>(p)*, where* D ⊆ P *is the set of places* p ∈ P *such that* τ (p) > 0*. The* concurrency threshold ofN *is defined by*

> CT(N) *def* = max conc(M) | M ∈ R<sup>N</sup> (M) .

The following lemma follows easily from the definitions.

**Lemma 2.** *For every workflow net* N*:* RT(N) ≤ CT(N)*.*

*Proof.* Follows immediately from the fact that for every schedule f of a run of N, there is a schedule g with CT(N) machines such that g(p) ≤ f(p) for every place p.

In the rest of the paper we study the complexity of computing the concurrency threshold. In [4], it was shown that the threshold can be computed in polynomial time for regular workflows, a class with a very specific structure, and the problem for the general free-choice case was left open. In Sect. 4.1 we prove that the concurrency threshold of marked graphs can be computed in polynomial time by reduction to a linear programming problem over the rational numbers. In Sect. 4.2 we study the free-choice case. We show that deciding if the threshold exceeds a given value is NP-complete for acyclic, sound free-choice workflow nets. Further, it can be computed by solving the same linear programming problem as in the case of marked graphs, but over the integers. Finally, we show that in the cyclic case the problem remains NP-complete, but the integer linear programming problem does not necessarily yield the correct solution.

#### **4.1 Concurrency Threshold of Marked Graphs**

The concurrency threshold of marked graphs can be computed using a standard technique based on the *marking equation* [16]. Given a net N = (P, T, F), define the *incidence matrix* of N as the |P|×|T| matrix *N* given by:

$$\mathcal{N}(p,t) = \begin{cases} 1 & \text{if } p \in t^\bullet \backslash t^\bullet \\ -1 & \text{if } p \in \,^\bullet t \backslash t^\bullet \\ 0 & \text{otherwise} \end{cases}$$

In the following, we denote by *M* the representation of a marking M as a vector of dimension |P|. Let N be a Petri net, and let M1, M<sup>2</sup> be markings of N. The following results are well known from the literature (see e.g. [16]):


Given a workflow net <sup>N</sup> = (P, T, F, I, O, τ ), let *<sup>D</sup>* : <sup>P</sup> → <sup>N</sup> be the vector defined by *D*(p) = 1 if p ∈ D and *D*(p) = 0 if p /∈ D, where D is the set of places with positive duration. We define the linear optimization problem

$$\ell^N = \max \left\{ \mathbf{D} \cdot \mathbf{M} \mid \mathbf{M} = \mathbf{M}\_I + \mathbf{N} \cdot \mathbf{X}, \mathbf{M} \ge 0, \mathbf{X} \ge 0 \right\} \tag{1}$$

Since the solutions of *M* = *MI* + *N* · *X* contain all the reachable markings of (N,MI ), we have <sup>N</sup> <sup>≥</sup> CT(N). Further, using these results above, we obtain:

**Theorem 2.** *Let* N *be a workflow net, and let* <sup>N</sup> <sup>Q</sup> *and* <sup>N</sup> <sup>Z</sup> *be the solution of the linear optimization problem (1) over the rationals and over the integers, respectively. We have:*

*–* <sup>N</sup> <sup>Q</sup> ≥ <sup>N</sup> <sup>Z</sup> ≥ CT(N)*; – If* N *is a marked graph, then* <sup>Q</sup> = <sup>Z</sup> = CT(N)*. – If* N *is acyclic, then* <sup>Q</sup> ≥ <sup>Z</sup> = CT(N)*.*

In particular, it follows that CT(N) can be computed in polynomial time for marked graphs, acyclic or not. (The result about acyclic nets is used in the next section.)

#### **4.2 Concurrency Threshold of Free-Choice Nets**

We study the complexity of computing the concurrency threshold of free-choice workflow nets. We first show that, contrary to numerous other properties for which there are polynomial algorithms, deciding if the concurrency threshold exceeds a given value is NP-complete.

**Theorem 3.** *The following problem is NP-complete:*

**Given:** *A sound, free-choice workflow net* N = (P, T, F, I, O)*, and a number* k ≤ |T|*.* **Decide:** *Is the concurrency threshold of* N *at least* k*?*

*Proof.* A detailed proof can be found in the full version of this paper [15], here we only sketch the argument. Membership in NP is nontrivial, and follows from results of [1,7]. We prove NP-hardness by means of a reduction from Maximum Independent Set (MIS):

**Given**: An undirected graph G = (V,E), and a number k ≤ |V |. **Decide**: Is there a set *In* ⊆ V such that |*In*| ≥ k and {v, u} ∈/ E for every u, v ∈ *In*?

Given a graph <sup>G</sup> = (V,E), we construct a sound free-choice workflow net <sup>N</sup>G in polynomial time as follows:


**Fig. 5.** Gadgets for the proof of Theorem 3.

It is easy to see that <sup>N</sup>G is free-choice and sound, and in [15] we show the result of applying the reduction to a small graph and prove that G has an independent set of size at least <sup>k</sup> iff the concurrency threshold of (NG, MI ) is at least 2|E| + k. The intuition is that for each edge e ∈ E, we fire the transition [e, u] <sup>1</sup> where u /<sup>∈</sup> *In*, and for each <sup>v</sup> <sup>∈</sup> *In*, we fire the transition <sup>v</sup><sup>1</sup>, thus marking one of [e, u] <sup>2</sup> or [e, v] <sup>2</sup> for each edge <sup>e</sup> <sup>∈</sup> <sup>E</sup> and the place <sup>v</sup><sup>2</sup> for each <sup>v</sup> <sup>∈</sup> *In*.

#### **4.3 Approximating the Concurrency Threshold**

Recall that the solution of problem (1) over the rationals or the integers is always an upper bound on the concurrency threshold for any Petri net (Theorem 2). The question is whether any stronger result holds when the workflows are sound and free-choice. Since computing the concurrency threshold is NP-complete, we cannot expect the solution over the rationals, which is computable in polynomial time, to provide the exact value. However, it could still be the case that the solution over the integers is always exact. Unfortunately, this is not true, and we can prove the following results:

**Theorem 4.** *Given a Petri net* N*, let* <sup>N</sup> <sup>Q</sup> *and* <sup>N</sup> <sup>Z</sup> *be as in Theorem 2.*


*Proof.* For (a), we can take the net obtained by adding to the gadget in Fig. 5(a) a new transition with input places [e, v] <sup>4</sup> and [e, u] <sup>4</sup>, and an output place o with weight 2. We take e<sup>0</sup> as input place. The concurrency threshold is clearly 2, reached, for example, after firing [e, v] <sup>1</sup>. However, we have <sup>N</sup> <sup>Q</sup> = 3, reached by the rational solution *X* = (1/2, 1/2,..., 1/2). Indeed, the marking equation then yields the marking M satisfying M([e, v] <sup>2</sup>) = M([e, u] <sup>2</sup>) = M(o)=1/2.

For (b), we can take the workflow net of Fig. 6. It is easy to see that the concurrency threshold is equal to 1. The marking *M* that puts one token in each of the two places with weight 1, and no token in the rest of the places, is not reachable from <sup>M</sup>I . However, it is a solution of the marking equation, even when solved over the integers. Indeed, we have *M* = *MI* +*N* ·*X* for *X* =(1,0,1,1,0,0,1). Therefore, the upper bound derived from the marking equation is 2.

**Fig. 6.** A sound free-choice workflow net for which the linear programming problem derived from the marking equation does not yield the exact value of the concurrency bound, even when solved over the integers.

### **5 Concurrency Threshold: A Practical Approach**

We have implemented a tool<sup>1</sup> to compute an upper bound on the concurrency threshold by constructing a linear program and solving it by calling the mixedinteger linear programming solver Cbc from the COIN-OR project [14]. Additionally, fixing a number k, we used the state-of-the art Petri net model checker LoLA [19] to both establish a lower bound, by querying LoLA for existence of a reachable marking M with conc(M) ≥ k; and to establish an upper bound, by querying LoLA if all reachable markings M satisfy conc(M ) ≤ k.

We evaluated the tool on a set of 1386 workflow nets extracted from a collection of five libraries of industrial business processes modeled in the IBM Web-Sphere Business Modeler [9]. For the concurrency threshold, we set D = P \ O. These nets also have multiple output places, however with a slightly different semantics for soundness allowing unmarked output places in the final marking. We applied the transformation described in [12] to ensure all output places will be marked in the final marking. This transformation preserves soundness and the concurrency threshold.

All of the 1386 nets in the benchmark libraries are free-choice nets. We selected the sound nets among them, which are 642. Out of those 642 nets, 409 are marked graphs. Out of the remaining 233 nets, 193 are acyclic and 40 cyclic. We determined the exact concurrency threshold of all sound nets with LoLA using state-space exploration. Figure 7 shows the distribution of the threshold.

**Fig. 7.** Distribution of the concurrency threshold of the 642 nets analyzed.

On all 642 sound nets, we computed an upper bound on the concurrency threshold using our tool, both using rational and integer variables. We computed lower and upper bounds using LoLA with the value k = CT(N) of the concurrency threshold. We report the results for computing the lower and upper bound separately.

All experiments were performed on the same machine equipped with an Intel Core i7-6700K CPU and 32 GB of RAM. The results are shown in Table 1.

<sup>1</sup> The tool is available from https://gitlab.lrz.de/i7/macaw.

Using the linear program, we were able to compute an upper bound for all nets in total in less than 7 s, taking at most 30 ms for any single net. LoLA could compute the lower bound for all nets in 6 s LoLA fails to compute the upper bound in three cases due to reaching the memory limit of 32 GB. For the remaining 639 nets, LoLA could compute the upper bound within 7 min in total.

We give a detailed analysis for the 9 nets with a state space of over one million. For three nets with state space of sizes 10<sup>9</sup>, 10<sup>10</sup> and 10<sup>17</sup>, LoLa reaches the memory limit. For four nets with state spaces between 10<sup>6</sup> and 10<sup>8</sup> and concurrency threshold above 25, LoLA takes 2, 10, 48 and 308 s each. For two nets with a state space of 10<sup>8</sup> and a concurrency threshold of just 11, LoLA can establish the upper bound in at most 20 ms. The solution of the linear program can be computed in all 9 cases in less than 30 ms.

**Table 1.** Statistics on the size and analyis time for the 642 nets analyzed. The times marked with <sup>∗</sup> exclude the 3 nets where LoLA reaches the memory limit.


Comparing the values of the upper bound, first we observed that we obtained the same value using either rational or integer variables. The time difference between both was however negligible. Second, quite surprisingly, we noticed that the upper bound obtained from the linear program is exact in all of our cases, even for the cyclic ones. Further, it can be computed much faster in several cases than the upper bound obtained by LoLA and it gives a bound in all cases, even when the state-space exploration reaches its limit. By combining linear programming for the upper bound and state-space exploration for the lower bound, an exact bound can always be computed within a few seconds.

#### **6 Conclusion**

Planning sufficient execution resources for a business or production process is a crucial part of process engineering [3,13,20]. We considered a simple version of this problem in which resources are uniform and tasks are not interruptible. We studied the complexity of computing the resource threshold, i.e., the minimal number of resources allowing an optimal makespan. We showed that deciding if the resource threshold exceeds a given bound is NP-hard even for acyclic marked graphs. For this reason, we investigated the complexity of computing the concurrency threshold, an upper bound of the resource threshold introduced in [4]. Solving a problem left open in [4], we showed that deciding if the concurrency threshold exceeds a given bound is NP-hard for general sound free-choice workflow nets. We then presented a polynomial-time approximation algorithm, and showed experimentally that it computes the *exact* value of the concurrency threshold for all benchmarks of a standard suite of free-choice workflow nets.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Fine-Grained Complexity of Safety Verification**

Peter Chini(B), Roland Meyer(B) , and Prakash Saivasan(B)

> TU Braunschweig, Braunschweig, Germany *{*p.chini,roland.meyer,p.saivasan*}*@tu-bs.de

**Abstract.** We study the fine-grained complexity of Leader Contributor Reachability (LCR) and Bounded-Stage Reachability (BSR), two variants of the safety verification problem for shared-memory concurrent programs. For both problems, the memory is a single variable over a finite data domain. We contribute new verification algorithms and lower bounds based on the Exponential Time Hypothesis (ETH) and kernels.

LCR is the question whether a designated leader thread can reach an unsafe state when interacting with a certain number of equal contributor threads. We suggest two parameterizations: (1) By the size of the data domain D and the size of the leader L, and (2) by the size of the contributors <sup>C</sup>. We present two algorithms, running in *<sup>O</sup>*<sup>∗</sup>((L*·* (D+ 1))<sup>L</sup>·<sup>D</sup> *·*D<sup>D</sup> ) and *<sup>O</sup>*<sup>∗</sup>(4<sup>C</sup> ) time, showing that both parameterizations are fixed-parameter tractable. Further, we suggest a modification of the first algorithm suitable for practical instances. The upper bounds are complemented by (matching) lower bounds based on ETH and kernels.

For BSR, we consider programs involving t different threads. We restrict the analysis to computations where the write permission changes *s* times between the threads. BSR asks whether a given configuration is reachable via such an *s*-stage computation. When parameterized by P, the maximum size of a thread, and t, the interesting observation is that the problem has a large number of difficult instances. Formally, we show that there is no polynomial kernel, no compression algorithm that reduces D or *s* to a polynomial dependence on P and t. This indicates that symbolic methods may be harder to find for this problem.

A full version of the paper is available as [9].

#### **1 Introduction**

We study the fine-grained complexity of two safety verification problems [1,16, 27] for shared-memory concurrent programs. The motivation to reconsider these problems are recent developments in fine-grained complexity theory [6,10,30,33]. They suggest that classifications such as NP or even FPT are too coarse to explain the success of verification methods. Instead, it should be possible to identify the precise influence that parameters of the input have on the verification time. Our contribution confirms this idea. We give new verification algorithms for the two problems that, for the first time, can be proven optimal in the sense of finegrained complexity theory. To state the results, we need some background. As we proceed, we explain the development of fine-grained complexity theory.

There is a well-known gap between the success that verification tools see in practice and the judgments about computational hardness that worst-case complexity is able to give. The applicability of verification tools steadily increases by tuning them towards industrial instances. The complexity estimation is stuck with considering the input size (or at best assumes certain parameters to be constant, which does not mean much if the runtime is then nk, where n is the input size and k the parameter).

The observation of a gap between practical algorithms and complexity theory is not unique to verification but made in every field that has to solve computationally hard problems. Complexity theory has taken up the challenge to close the gap. So-called *fixed-parameter tractability* (FPT) [11,13] proposes to identify parameters k so that the runtime is f(k)*poly*(n), where f is a computable function. These parameters are powerful in the sense that they dominate the complexity.

For an FPT result to be useful, function f should only be mildly exponential, and of course k should be small in the instances of interest. Intuitively, they are what one needs to optimize. *Fine-grained complexity* is the study of upper and lower bounds on function f. Indeed, the fine-grained complexity of a problem is written as O∗(f(k)), emphasizing f and k and suppressing the polynomial part. For upper bounds, the approach is still to come up with an algorithm.

For lower bounds, fine-grained complexity has taken a new and very pragmatic perspective. For the problem of n-variable 3-SAT the best known algorithm runs in 2<sup>n</sup>, and this bound has not been improved since 1970. The idea is to take improvements on this problem as unlikely, known as the exponentialtime hypothesis (ETH) [30]. ETH serves as a lower bound that is reduced to other problems [33]. An even stronger assumption about n-variable SAT, called SETH [6,30], and a similar one about *Set Cover* [10] allow for lower bounds like the absence of (2 <sup>−</sup> <sup>ε</sup>)<sup>n</sup> algorithms.

In this work, we contribute fine-grained complexity results for verification problems on concurrent programs. The first problem is reachability for a leader thread that is interacting with an unbounded number of contributors (LCR) [16, 27]. We show that, assuming a parameterization by the size of the leader L and the size of the data domain <sup>D</sup>, the problem can be solved in <sup>O</sup><sup>∗</sup>((L·(D+1))<sup>L</sup>·<sup>D</sup> ·D<sup>D</sup>). At the heart of the algorithm is a compression of computations into witnesses. To check reachability, our algorithm then iterates over candidates for witnesses and checks each of them for being a proper witness. Interestingly, we can formulate a variant of the algorithm that seems to be suited for large state spaces.

Using ETH, we show that the algorithm is (almost) optimal. Moreover, the problem is shown to have a large number of hard instances. Technically, there is no polynomial kernel [4,5]. Experience with kernel lower bounds is still limited. This notion of hardness seems to indicate that symbolic methods are hard to apply to the problem. The lower bounds that we present share similarities with the reductions from [7,24,25].

If we consider the size of the contributors a parameter, we obtain a singly exponential upper bound that we also prove to be tight. The saturation-based technique that we use is inspired by thread-modular reasoning [20,21,26,29].

The second problem we study generalizes bounded context switching. Bounded-stage reachability (BSR) asks whether a state is reachable if there is a bound s on the number of times the write permission is allowed to change between the threads [1]. Again, we show the new form of kernel lower bound. The result is tricky and highlights the power of the computation model.

The results are summarized by the table below. Two findings stand out, we highlight them in gray. We present a new algorithm for LCR. Moreover, we suggest kernel lower bounds as hardness indicators for verification problems. The lower bound for BSR is particularly difficult to achieve.


**Related Work.** Concurrent programs communicating through a shared memory and having a fixed number of threads have been extensively studied [2,14,22,28]. The leader contributor reachability problem as considered in this paper was introduced as parametrized reachability in [27]. In [16], it was shown to be NP-complete when only finite-state programs are involved and PSPACEcomplete for recursive programs. In [31], the parameterized pairwise-reachability problem was considered and shown to be decidable. Parameterized reachability under a variant of round-robin scheduling was proven decidable in [32].

The bounded-stage restriction on the computations of concurrent programs as considered here was introduced in [1]. The corresponding reachability problem was shown to be NP-complete when only finite-state programs are involved. The problem remains in NEXP-time and PSPACE-hard for a combination of counters and a single pushdown. The bounded-stage restriction generalizes the concept of bounded context switching from [34], which was shown to be NP-complete in that paper. In [8], FPT algorithms for bounded context switching were obtained under various parameterization. In [3], networks of pushdowns communicating through a shared memory were analyzed under various topological restrictions.

There have been few efforts to obtain fixed-parameter-tractable algorithms for automata and verification-related problems. FPT algorithms for automata problems have been studied in [18,19,35]. In [12], model-checking problems for synchronized executions on parallel components were considered and proven intractable. In [15], the notion of conflict serializability was introduced for the TSO memory model and an FPT algorithm for checking serializability was provided. The complexity of predicting atomicity violations on concurrent systems was considered in [17]. The finding is that FPT solutions are unlikely to exist.

#### **2 Preliminaries**

We introduce our model for programs, which is fairly standard and taken from [1, 16,27], and give the basics on fixed-parameter tractability.

**Programs.** A program consists of finitely many threads that access a shared memory. The memory is modeled to hold a single value at a time. Formally, a *(shared-memory) program* is a tuple <sup>A</sup> = (*D*, a<sup>0</sup>,(Pi)<sup>i</sup>∈[1..t]). Here, *<sup>D</sup>* is the data domain of the memory and <sup>a</sup><sup>0</sup> <sup>∈</sup> *<sup>D</sup>* is the initial value. Threads are modeled as control-flow graphs that write values to or read values from the memory. These operations are captured by *Op*(*D*) = {!a, ?a | a ∈ *D*}. We use the notation *W* (*D*) = {!a | a ∈ *D*} for the write operations and *R*(*D*) = {?a | a ∈ *D*} for the read operations. A thread Pid is a non-deterministic finite automaton (*Op*(*D*), Q, q0, δ) over the alphabet of operations. The set of states is Q with <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> the initial state. The final states will depend on the verification task. The transition relation is δ ⊆ Q × (*Op*(*D*) ∪ {ε}) × Q. We extend it to words and also write <sup>q</sup> <sup>w</sup> −→ q for q ∈ δ(q, w). Whenever we need to distinguish between different threads, we add indices and write Qid or δid.

The semantics of a program is given in terms of labeled transitions between configurations. A *configuration* is a pair (*pc*, a) ∈ (Q<sup>1</sup> × ··· × Qt) × *D*. The program counter *pc* is a vector that shows the current state *pc*(i) ∈ Q<sup>i</sup> of each thread Pi. Moreover, the configuration gives the current value in memory. We call c<sup>0</sup> = (*pc*0, a<sup>0</sup>) with *pc*<sup>0</sup>(i) = q<sup>0</sup> <sup>i</sup> for all i ∈ [1..t] the initial configuration. Let C denote the set of all configurations. The transition relation among configurations → ⊆ C × (*Op*(*D*)∪ {ε}) × C is obtained by lifting the transition relations of the threads. To define it, let *pc*<sup>1</sup> = *pc*[i = qi], meaning thread P<sup>i</sup> is in state q<sup>i</sup> and otherwise the program counter coincides with *pc*. Let *pc*<sup>2</sup> = *pc*[i = q <sup>i</sup>]. If thread P<sup>i</sup> tries to read with the transition q<sup>i</sup> ?<sup>a</sup> −→ <sup>q</sup> <sup>i</sup>, then (*pc*1, a) ?<sup>a</sup> −→ (*pc*2, a). Note that the memory is required to hold the desired value. If the thread has the transition qi !b −→ q <sup>i</sup>, then (*pc*1, a) !<sup>b</sup> −→ (*pc*2, b). Finally, q<sup>i</sup> ε −→ q <sup>i</sup> yields (*pc*1, a) <sup>ε</sup> −→ (*pc*2, a). The program's transition relation is generalized to words, <sup>c</sup> <sup>w</sup> −→ c . We call such a sequence of consecutive labeled transitions a *computation*. To indicate that there is a word that justifies a computation from c to c , we write c →<sup>∗</sup> c . We may use an index <sup>w</sup> −→<sup>i</sup> to indicate that the computation was induced by thread Pi. Where appropriate, we also use the program as an index, <sup>w</sup> −→A.

**Fixed-Parameter Tractability.** We wish to study the fine-grained complexity of safety verification problems for the above programs. This means our goal is to identify parameters of these problems that have two properties. First, in practical instances they are small. Second, assuming that these parameters are small, show that there are efficient verification algorithms. *Parametrized complexity* makes precise the idea of an algorithm being efficient relative to a parameter.

<sup>A</sup> *parameterized problem* <sup>L</sup> is a subset of <sup>Σ</sup><sup>∗</sup> <sup>×</sup> <sup>N</sup>. The problem is *fixedparameter tractable* if there is a deterministic algorithm that, given (x, k) <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>×N, decides (x, k) ∈ L in time f(k) · |x| <sup>O</sup>(1). We use FPT for the class of all fixedparameter-tractable problems and say *a problem is* FPT to mean it is in that class. Note that f is a computable function that only depends on the parameter k. It is common to denote the runtime by O∗(f(k)) and suppress the polynomial part. We will be interested in the precise dependence on the parameter, in upper and lower bounds on the function f. This study is often referred to as *fine-grained complexity*.

Lower bounds on f are obtained by the *Exponential Time Hypothesis* (ETH). It assumes that there is no algorithm solving n-variable 3-SAT in 2o(n) time. The reasoning is as follows: If f dropped below a certain bound, ETH would fail.

While many parameterizations of NP-hard problems were proven to be fixedparameter tractable, there are problems that are unlikely to be FPT. Such problems are hard for the complexity class W[1]. The appropriate notion of reduction for a theory of relative hardness in parameterized complexity is called *parameterized reduction*.

### **3 Leader Contributor Reachability**

We consider the *leader contributor reachability problem* for shared-memory programs. The problem was introduced in [27] and shown to be NP-complete in [16] for the finite-state case.<sup>1</sup> We contribute two new verification algorithms that target two parameterizations of the problem. In both cases, our algorithms establish fixed-parameter tractability. Moreover, with matching lower bounds we prove them to be optimal even in the fine-grained sense.

An instance of the leader contributor reachability problem is given by a shared-memory program of the form <sup>A</sup> = (*D*, a0,(PL,(Pi)<sup>i</sup>∈[1..t])). The program has a designated *leader* thread P<sup>L</sup> and several *contributor* threads P1,...,Pt. In addition, we are given a set of unsafe states for the leader. The task is to check whether the leader can reach an unsafe state when interacting with a number of instances of the contributors. It is worth noting that the problem can be reduced to having a single contributor. Let the corresponding thread P<sup>C</sup> be the union of P1,...,P<sup>t</sup> (constructed using an initial ε-transition). We base our complexity analysis on this simplified formulation of the problem.

For the definition, let <sup>A</sup> = (*D*, a<sup>0</sup>,(PL, P<sup>C</sup> )) be a program with two threads. Let <sup>F</sup><sup>L</sup> <sup>⊆</sup> <sup>Q</sup><sup>L</sup> be a set of unsafe states of the leader. For <sup>t</sup> <sup>∈</sup> <sup>N</sup>, define the program <sup>A</sup><sup>t</sup> = (*D*, a<sup>0</sup>,(PL,(P<sup>C</sup> )<sup>i</sup>∈[1..t])) to have <sup>t</sup> copies of <sup>P</sup><sup>C</sup> . Further, let <sup>C</sup><sup>f</sup> be the set of configurations where the leader is in an unsafe state (from FL). The problem of interest is as follows:

*Leader Contributor Reachability* (LCR) **Input:** A program <sup>A</sup> = (*D*, a<sup>0</sup>,(PL, P<sup>C</sup> )) and a set of states <sup>F</sup><sup>L</sup> <sup>⊆</sup> <sup>Q</sup>L. **Question:** Is there a <sup>t</sup> <sup>∈</sup> <sup>N</sup> such that <sup>c</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup>*<sup>t</sup>* <sup>c</sup> for some <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> ?

<sup>1</sup> The problem is called parameterized reachability in these works. We renamed it to avoid confusion with parameterized complexity.

We consider two parameterizations of LCR. First, we parameterize by D, the size of the data domain *D*, and L, the number of states of the leader PL. We denote the parameterization by LCR(D, L). While for LCR(D, L) we obtain an FPT algorithm, it is not likely that LCR(D) and LCR(L) admit the same. These parameterizations are W[1]-hard. For details, we refer to the full version [9].

The second parameterization that we consider is LCR(C), a parameterization by the number of states of the contributor P<sup>C</sup> . We prove that the parameter is enough to obtain an FPT algorithm.

#### **3.1 Parameterization by Memory and Leader**

We give an algorithm that solves LCR in time <sup>O</sup><sup>∗</sup>((L·(D+1))<sup>L</sup>·<sup>D</sup> ·D<sup>D</sup>), which means LCR(D, L) is FPT. We then show how to modify the algorithm to solve instances of LCR as they are likely to occur in practice. Interestingly, the modified version of the algorithm lends itself to an efficient implementation based on off-the-shelf sequential model checkers. We conclude with lower bounds for LCR(D, L).

**Upper Bound.** We give an algorithm for the parameterization LCR(D, L). The key idea is to compactly represent computations that may be present in an instance of the given program. To this end, we introduce a domain of so-called witness candidates. The main technical result, Lemma 4, links computations and witness candidates. It shows that reachability of an unsafe state holds in an instance of the program if and only if there is a witness candidate that is valid (in a precise sense). With this, our algorithm iterates over all witness candidates and checks each of them for being valid. To state the overall result, let *Wit*(L, <sup>D</sup> )=(<sup>L</sup> · (<sup>D</sup> + 1))<sup>L</sup>·<sup>D</sup> · <sup>D</sup><sup>D</sup> · <sup>L</sup> be the number of witness candidates and let *Valid*(L, <sup>D</sup>, <sup>C</sup> ) = <sup>L</sup><sup>3</sup> · <sup>D</sup><sup>2</sup> · <sup>C</sup><sup>2</sup> be the time it takes to check validity of a candidate. Note that it is polynomial.

# **Theorem 1.** LCR *can be solved in time* <sup>O</sup>(*Wit*(*L*, *<sup>D</sup>* ) · *Valid*(*L*, *<sup>D</sup>*, *<sup>C</sup>* ))*.*

Let <sup>A</sup> = (*D*, a<sup>0</sup>,(PL, P<sup>C</sup> )) be the program of interest and <sup>F</sup><sup>L</sup> be the set of unsafe states in the leader. Assume we are given a computation ρ showing that P<sup>L</sup> can reach a state in F<sup>L</sup> when interacting with a number of contributors. We explain the main ideas to find an efficient representation for ρ that still allows for the reconstruction of a similar computation. To simplify the presentation, we assume the leader never writes (!a) and immediately reads (?a) the same value. If this is the case, the read can be replaced by ε.

In a first step, we delete most of the moves in ρ that were carried out by contributors. We only keep *first writes*. For each value a, this is the write transition *fw*(a) = c !a −→ c where a is written by a contributor for the first time. The reason we can omit subsequent writes of a is the following: If *fw*(a) is carried out by contributor P1, we can assume that there is an arbitrary number of other contributors that all mimicked the behavior of P1. This means whenever P<sup>1</sup> did a transition, they copycatted it right away. Hence, there are arbitrarily many contributors pending to write a. Phrased differently, the symbol a is available for the leader whenever P<sup>L</sup> needs to read it. The idea goes back to the *Copycat Lemma* stated in [16]. The reads of the contributors are omitted as well. We will make sure they can be served by the first writes and the moves done by PL.

After the deletion, we are left with a shorter expression ρ . We turn it into a word <sup>w</sup> over the alphabet <sup>Q</sup><sup>L</sup> <sup>∪</sup>*D*<sup>⊥</sup> <sup>∪</sup>*D*¯ with *<sup>D</sup>*<sup>⊥</sup> <sup>=</sup> *<sup>D</sup>* ∪{⊥} and *<sup>D</sup>*¯ <sup>=</sup> {a¯ <sup>|</sup> <sup>a</sup> <sup>∈</sup> *<sup>D</sup>*}. Each transition c !a/?a/ε −−−−−→<sup>L</sup> c in ρ that is due to the leader moving from q to q is mapped (i) to q.a.q if it is a write and (ii) to q.⊥.q otherwise. A first write *fw*(a) = <sup>c</sup> <sup>a</sup> −→ c of a contributor is mapped to ¯a. We may assume that the resulting word <sup>w</sup> is of the form <sup>w</sup> <sup>=</sup> <sup>w</sup>1.w<sup>2</sup> with <sup>w</sup><sup>1</sup> <sup>∈</sup> ((QL.*D*⊥)∗.*D*¯ )<sup>∗</sup> and w<sup>2</sup> ∈ (QL.*D*⊥)∗.FL. Note that w can still be of unbounded length.

In order to find a witness of bounded length, we compress w<sup>1</sup> and w<sup>2</sup> to w <sup>1</sup> and w <sup>2</sup>. Between two first writes ¯<sup>a</sup> and ¯<sup>b</sup> in <sup>w</sup>1, the leader can perform an unbounded number of transitions, represented by a word in (QL.*D*⊥)∗. Hence, there are states <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>L</sup> repeating between ¯<sup>a</sup> and ¯b. We contract the word between the first and the last occurrence of q into just a single state q. This state now represents a loop on PL. Since there are L states in the leader, this bounds the number of contractions. Furthermore, we know that the number of first writes is bounded by D, each symbol can be written for the first time at most once. Thus, the compressed string w <sup>1</sup> is in the language ((QL.*D*⊥)≤<sup>L</sup>.*D*¯ )≤<sup>D</sup>.

The word w<sup>2</sup> is of the form w<sup>2</sup> = q.u for a state q ∈ Q<sup>L</sup> and a word u. We truncate the word u and only keep the state q. Then we know that there is a computation leading from q to a state in F<sup>L</sup> where P<sup>L</sup> can potentially write any symbol but read only those symbols which occurred as a first write in w 1. Altogether, we are left with a word of bounded length.

# **Definition 2.** *The set of witness candidates is* <sup>E</sup> = ((QL.*D*⊥)≤*<sup>L</sup>*.*D*¯ )≤*<sup>D</sup>*.QL*.*

To characterize computations in terms of witness candidates, we define the notion of validity. This needs some notation. Consider a word w = w<sup>1</sup> ...w over some alphabet Γ. For i ∈ [1..], we set w[i] = w<sup>i</sup> and w[1..i] = w<sup>1</sup> ...wi. If Γ ⊆ Γ, we use w↓<sup>Γ</sup> for the projection of w to the letters in Γ .

Consider a witness candidate <sup>w</sup> ∈ E and let <sup>i</sup> <sup>∈</sup> [1..|w|]. We use *<sup>D</sup>*¯ (w, i) for the set of all first writes that occurred in w up to position i. Formally, *<sup>D</sup>*¯ (w, i) = {<sup>a</sup> <sup>|</sup> <sup>a</sup>¯ is a letter in <sup>w</sup>[1..i]↓*<sup>D</sup>*¯ }. We abbreviate *<sup>D</sup>*¯ (w, <sup>|</sup>w|) as *<sup>D</sup>*¯ (w). Let q ∈ Q<sup>L</sup> and S ⊆ *D*. Recall that the state represents a loop in PL. The set of all letters written within a loop from q to q when reading only symbols from S is Loop(q, S) = {<sup>a</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> *<sup>D</sup>* and <sup>∃</sup>v1, v<sup>2</sup> <sup>∈</sup> (*<sup>W</sup>* (*D*) <sup>∪</sup> *<sup>R</sup>*(S))<sup>∗</sup> : <sup>q</sup> <sup>v</sup>1!av<sup>2</sup> −−−−→<sup>L</sup> <sup>q</sup>}.

The definition of validity is given next. The three requirements are made precise in the text below.

**Definition 3.** *A witness candidate* w ∈ E *is* valid *if it satisfies the following properties: (1) First writes are unique. (2) The word* w *encodes a run in* PL*. (3) There are supportive computations on the contributors.*


Alternatively, there is a read q<sup>i</sup> ?<sup>a</sup> −→<sup>L</sup> <sup>q</sup>i+1 of a symbol <sup>a</sup> <sup>∈</sup> *<sup>D</sup>*¯ (w, pos(ai)) that already occurred within a first write (the leader does not read the own writes). Here, we use pos(ai) to access the position of a<sup>i</sup> in w. State q<sup>1</sup> = q<sup>0</sup> L is initial. There is a run from q+1 to a state q<sup>f</sup> ∈ FL. During this run, reading is restricted to symbols that occurred as first writes in w. Formally, there is a <sup>v</sup> <sup>∈</sup> (*<sup>W</sup>* (*D*) <sup>∪</sup> *<sup>R</sup>*(*D*¯ (w)))<sup>∗</sup> such that <sup>q</sup>+1 v −→<sup>L</sup> q<sup>f</sup> .

(3) For each prefix va¯ of <sup>w</sup> with ¯<sup>a</sup> <sup>∈</sup> *<sup>D</sup>*¯ there is a computation <sup>q</sup><sup>0</sup> C <sup>u</sup>!<sup>a</sup> −−→<sup>C</sup> <sup>q</sup> on <sup>P</sup><sup>C</sup> so that the reads in u can be obtained from v. Formally, let u = u↓*R*(*D*). Then there is an embedding of u into v, a monotone map μ : [1..|u |] → [1..|v|] that satisfies the following. Let u [i]=?a with a ∈ *D*. The read is served in one of the following three ways. We may have v[μ(i)] = a, which corresponds to a write of <sup>a</sup> by <sup>P</sup>L. Alternatively, <sup>v</sup>[μ(i)] = <sup>q</sup> <sup>∈</sup> <sup>Q</sup><sup>L</sup> and <sup>a</sup> <sup>∈</sup> Loop(q, *<sup>D</sup>*¯ (w, μ(i))). This amounts to reading from a leader's write that was executed in a loop. Finally, we may have <sup>a</sup> <sup>∈</sup> *<sup>D</sup>*¯ (w, μ(i)), corresponding to reading from another contributor.

**Lemma 4.** *There is a* <sup>t</sup> <sup>∈</sup> <sup>N</sup> *so that* <sup>c</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup>*<sup>t</sup>* <sup>c</sup> *with* <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> *if and only if there is a valid witness candidate* w ∈ E*.*

Our algorithm iterates over all witness candidates w ∈ E and tests whether w is valid. The number of candidates *Wit*(L, <sup>D</sup>) is given by (<sup>L</sup> · (<sup>D</sup> + 1))<sup>L</sup>·<sup>D</sup> ·D<sup>D</sup> ·L. This is due to the fact that we can force a witness candidate to have maximum length via inserting padding symbols. The number of candidates constitutes the first factor of the runtime stated in Theorem 1. The polynomial factor *Valid*(L, D, C) is due to the following Lemma. Details are given in the full version of the paper [9].

**Lemma 5.** *Validity of* <sup>w</sup> ∈ E *can be checked in time* <sup>O</sup>(*L*<sup>3</sup> · *<sup>D</sup>* <sup>2</sup> · *<sup>C</sup>* <sup>2</sup>)*.*

**Practical Algorithm.** We improve the above algorithm so that it should work well on practical instances. The idea is to factorize the leader along its *strongly connected components* (SCCs), the number of which is assumed to be small in real programs. Technically, our improved algorithm works with *valid SCC-witnesses*. They symbolically represent SCCs rather than loops in the leader. To state the complexity, we define the *straight-line depth*, the number of SCCs the leader may visit during a computation. The definition needs a graph construction.

Let V ⊆ *<sup>D</sup>*¯ <sup>≤</sup><sup>D</sup> contain only words that do not repeat letters. Let <sup>r</sup> = ¯c<sup>1</sup> ... <sup>c</sup>¯ ∈ V and i ∈ [0..]. By P<sup>L</sup> ↓<sup>i</sup> we denote the automaton obtained from P<sup>L</sup> by removing all transitions that read a value outside {c1,...,c<sup>i</sup>}. Let SCC(P<sup>L</sup> <sup>↓</sup><sup>i</sup>) denote the set of all SCCs in this automaton. We construct the directed graph G(PL, r) as follows. The vertices are the SCCs of all P<sup>L</sup> ↓<sup>i</sup>, i ∈ [0..]. There is an edge between S, S <sup>∈</sup> SCC(P<sup>L</sup> <sup>↓</sup><sup>i</sup>), if there are states <sup>q</sup> <sup>∈</sup> S, q <sup>∈</sup> <sup>S</sup> with <sup>q</sup> <sup>→</sup> <sup>q</sup> in <sup>P</sup><sup>L</sup> <sup>↓</sup><sup>i</sup>. If <sup>S</sup> <sup>∈</sup> SCC(P<sup>L</sup> <sup>↓</sup><sup>i</sup>−<sup>1</sup>) and <sup>S</sup> <sup>∈</sup> SCC(P<sup>L</sup> <sup>↓</sup><sup>i</sup>), we only get an edge if we can get from <sup>S</sup> to S by reading ci. Note that the graph is acyclic.

The depth d(r) of P<sup>L</sup> relative to r is the length of the longest path in G(PL, r). The *straight-line depth* is <sup>d</sup> = max{d(r) <sup>|</sup> <sup>r</sup> ∈ V}. The *number of SCCs* <sup>s</sup> is the size of SCC(P<sup>L</sup> <sup>↓</sup>0). With these values at hand, the number of SCC-witness candidates (the definition of which can be found in the full version [9]) can be bounded by *WitSCC* (s, <sup>D</sup>, <sup>d</sup>) <sup>≤</sup> (<sup>s</sup> · (<sup>D</sup> + 1))<sup>d</sup> · <sup>D</sup><sup>D</sup> · <sup>2</sup>D+d. The time needed to test whether a candidate is valid is *ValidSCC* (L, <sup>D</sup>, <sup>C</sup>, <sup>d</sup>) = <sup>L</sup><sup>2</sup> · <sup>D</sup> · <sup>C</sup><sup>2</sup> · <sup>d</sup><sup>2</sup>.

# **Theorem 6.** LCR *can be solved in time* <sup>O</sup>(*WitSCC* (*s*, *<sup>D</sup>*, *<sup>d</sup>*)·*ValidSCC* (*L*, *<sup>D</sup>*, *<sup>C</sup>*, *<sup>d</sup>*))*.*

For this algorithm, what matters is that the leader's state space is strongly connected. The number of states has limited impact on the runtime.

**Lower Bound.** We prove that the algorithm from Theorem 1 is only a root factor away from being optimal: A 2<sup>o</sup>( <sup>√</sup>L·D·log(L·D))-time algorithm for LCR would contradict ETH. We achieve the lower bound by a reduction from k × k Clique, the problem of finding a clique of size k in a graph the vertices of which are elements of a k × k matrix. Moreover, the clique has to contain one vertex from each row. Unless ETH fails, the problem cannot be solved in time 2<sup>o</sup>(k·log(k)) [33].

Technically, we construct from an instance (G, k) of k × k Clique an instance (<sup>A</sup> = (*D*, a0,(PL, P<sup>C</sup> )), FL) of LCR such that <sup>D</sup> <sup>=</sup> <sup>O</sup>(k) and <sup>L</sup> <sup>=</sup> <sup>O</sup>(k). Furthermore, we show that G contains the desired clique of size k if and only if there is a <sup>t</sup> <sup>∈</sup> <sup>N</sup> such that <sup>c</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup>*<sup>t</sup>* <sup>c</sup> with <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> . Suppose we had an algorithm for LCR running in time 2<sup>o</sup>( <sup>√</sup>L·D·log(L·D)). Combined with the reduction, this would yield an algorithm for <sup>k</sup> <sup>×</sup> k Clique with runtime 2<sup>o</sup>( <sup>√</sup>k2·log(k2)) = 2<sup>o</sup>(k·log <sup>k</sup>) . But unless ETH fails, such an algorithm cannot exist.

#### **Proposition 7.** LCR *cannot be solved in time* 2<sup>o</sup>( <sup>√</sup>*L*·*D*·log(*L*·*D*)) *unless* ETH *fails.*

We assume that the vertices V of G are given by tuples (i, j) with i, j ∈ [1..k], where i denotes the row and j denotes the column. In the reduction, we need the leader and the contributors to communicate on the vertices of G. However, we cannot store tuples (i, j) in the memory as this would cause a quadratic blow-up <sup>D</sup> <sup>=</sup> <sup>O</sup>(k<sup>2</sup>). Instead, we communicate a vertex (i, j) as a string row(i). col(j). We distinguish between row and column symbols to avoid stuttering, the repeated reading of the same symbol. With this, it cannot happen that a thread reads a row symbol twice and takes it for a column.

The program starts its computation with each contributor choosing a vertex (i, j) to store. For simplicity, we denote a contributor storing (i, j) by P(i,j). Note that there can be copies of P(i,j).

Since there are arbitrarily many contributors, the chosen vertices are only a superset of the clique we want to find. To cut away the false vertices, the leader P<sup>L</sup> guesses for each row the vertex belonging to the clique. To this end, the program performs for each i ∈ [1..k] the following steps: If (i, ji) is the vertex of interest, P<sup>L</sup> first writes row(i) to the memory. Each contributor that is still active reads the symbol and moves on for one state. Then P<sup>L</sup> communicates the column by writing col(ji). Again, the active contributors P(i,j) read.

A contributor can react to the read symbol in three different ways: (1) If i = i, the contributor P(i,j) stores a vertex of a different row. The computation in P(i,j) can only go on if (i , j ) is connected to (i, ji) in G. Otherwise it will stop. (2) If i = i and j = ji, then P(i,j) stores exactly the vertex guessed by PL. In this case, P(i,j) can continue its computation. (3) If i = i and j = j, thread P(i,j) stores a different vertex from row i. The contributor has to stop its computation.

After k such rounds, there are only contributors left that store vertices guessed by PL. Furthermore, each two of these vertices are connected. Hence, they form a clique. To transmit this information to PL, each P(i,j*i*) writes #<sup>i</sup> to the memory, a special symbol for row i. After P<sup>L</sup> has read the string #<sup>1</sup> ... #k, it moves to its final state. A formal construction can be found in the full version [9].

**Absence of a Polynomial Kernel.** A kernelization of a parameterized problem is a compression algorithm. Given an instance, it returns an equivalent instance the size of which is bounded by a function only in the parameter. From an algorithmic perspective, kernels put a bound on the number of hard instances of the problem. Indeed, the search for small kernels is a key interest in algorithmics, similar to the search for fast FPT algorithms. Even more, it can be shown that kernels exist if and only if a problem admits an FPT algorithm [11].

Let Q be a parameterized problem. A *kernelization* of Q is an algorithm that transforms, in polynomial time, a given instance (B, k) into an equivalent instance (B , k ) such that |B | + k ≤ g(k), where g is a computable function. If g is a polynomial, we say that Q admits a *polynomial kernel*.

Unfortunately, for many problems the community failed to come up with polynomial kernels. This lead to the contrary approach, namely disproving their existence [4,5,23]. Such a result constitutes an exponential lower bound on the number of hard instances. Like computational hardness results, such a bound is seen as an indication of general hardness of the problem. Technically, the existence of a polynomial kernel for the problem of interest is shown to imply NP ⊆ coNP/poly. But this inclusion is unlikely as it would cause a collapse of the polynomial hierarchy to the third level [36].

In order to link the occurrence of a polynomial kernel for LCR(D, L) with the above inclusion, we follow the framework developed in [5]. Let Γ be an alphabet. A *polynomial equivalence relation* is an equivalence relation R on Γ<sup>∗</sup> with the following properties: Given x, y ∈ Γ∗, it can be decided in time polynomial in |x|+|y| whether (x, y) ∈ R. Moreover, for each n there are at most polynomially many equivalence classes in <sup>R</sup> restricted to <sup>Γ</sup> <sup>≤</sup><sup>n</sup>.

The key tool for proving kernel lower bounds are cross-compositions: Let <sup>L</sup> <sup>⊆</sup> <sup>Γ</sup><sup>∗</sup> be a language and <sup>Q</sup> <sup>⊆</sup> <sup>Γ</sup><sup>∗</sup> <sup>×</sup> <sup>N</sup> be a parameterized language. We say that L *cross-composes* into Q if there exists a polynomial equivalence relation R and an algorithm C, the *cross-composition*, with the following properties: C takes as input ϕ1,...,ϕ<sup>I</sup> ∈ Γ∗, all equivalent under R. It computes in time polynomial in -I =1 <sup>|</sup>ϕ<sup>|</sup> a string (y, k) <sup>∈</sup> <sup>Γ</sup><sup>∗</sup> <sup>×</sup><sup>N</sup> such that (y, k) <sup>∈</sup> <sup>Q</sup> if and only if there is an ∈ [1..I] with ϕ ∈ L. Furthermore, k ≤ p(max∈[1..I] |ϕ|+ log(I)) for a polynomial p.

It was shown in [5] that a cross-composition of any NP-hard language into a parameterized language Q prohibits the existence of a polynomial kernel for

Q unless NP ⊆ coNP/poly. In order to make use of this result, we show how to cross-compose 3-SAT into LCR(D, L). This yields the following:

**Theorem 8.** LCR(*D*, *<sup>L</sup>*) *does not admit a poly. kernel unless* NP <sup>⊆</sup> coNP/poly*.*

The difficulty of finding a cross-composition is in the restriction on the size of the parameters. This affects D and L: Both parameters are not allowed to depend polynomially on I, the number of given 3-SAT-instances. We resolve the polynomial dependence by encoding the choice of a 3-SAT-instance into the contributors via a binary tree.

*Proof (Idea).* Assume some encoding of Boolean formulas as strings over a finite alphabet. We use the polynomial equivalence relation R defined as follows: Two strings ϕ and ψ are equivalent under R if both encode 3-SAT-instances, and the numbers of clauses and variables coincide. On strings of bounded length, R has polynomially many equivalence classes.

Let the given 3-SAT-instances be ϕ1,...,ϕ<sup>I</sup> . Every two of them are equivalent under R. This means that all ϕ have the same number of clauses m and use the same set of variables {x1,...,x<sup>n</sup>}. We assume that <sup>ϕ</sup> <sup>=</sup> <sup>C</sup> <sup>1</sup> ∧···∧ <sup>C</sup> <sup>m</sup>.

We construct a program proceeding in three phases. First, it chooses an instance ϕ, then it guesses a valuation for all variables, and in the third phase it verifies that the valuation satisfies ϕ. While the second and the third phase do not cause a dependence of the parameters on I, the first phase does. It is not possible to guess a number ∈ [1..I] and communicate it via the memory as this would provoke a polynomial dependence of D on I.

To implement the first phase without a polynomial dependence, we transmit the indices of the 3-SAT-instances in binary. The leader guesses and writes tuples (u1, 1),...,(ulog(I), log(I)) with u ∈ {0, 1} to the memory. This amounts to choosing an instance ϕ with binary representation bin() = u<sup>1</sup> ...ulog(I).

It is the contributors' task to store this choice. Each time, the leader writes a tuple (ui, i), the contributors read and branch either to the left, if u<sup>i</sup> = 0, or to the right, if u<sup>i</sup> = 1. Hence, in the first phase, the contributors are binary trees with I leaves, each leaf storing the index of an instance ϕ. Since we did not assume that I is a power of 2, there may be computations arriving at leaves that do not represent proper indices. In this case, the computation deadlocks.

The size of *D* and P<sup>L</sup> in the first phase is O(log(I)). This satisfies the sizerestrictions of a cross-composition.

For guessing the valuation in the second phase, the system communicates on tuples (xi, v) with i ∈ [1..n] and v ∈ {0, 1}. The leader guesses such a tuple for each variable and writes it to the memory. Any participating contributor is free to read one of the tuples. After reading, it stores the variable and the valuation.

In the third phase, the satisfiability check is performed as follows: Each contributor that is still active has stored in its current state the chosen instance ϕ, a variable xi, and its valuation vi. Assume that x<sup>i</sup> when evaluated to v<sup>i</sup> satisfies C <sup>j</sup> , the j-th clause of ϕ. Then the contributor loops in its current state while writing the symbol #<sup>j</sup> . The leader waits to read the string #<sup>1</sup> ... #m. If P<sup>L</sup> succeeds, we are sure that the m clauses of ϕ were satisfied by the chosen valuation. Thus, ϕ is satisfiable and P<sup>L</sup> moves to its final state. For details of the construction, we refer to the full version of the paper [9]. 

#### **3.2 Parameterization by Contributors**

We show that the size of the contributors C has a wide influence on the complexity of LCR. We give an algorithm singly exponential in C, provide a matching lower bound, and prove the absence of a polynomial kernel.

**Upper Bound.** Our algorithm is based on saturation. We keep the states reachable by the contributors in a set and saturate it. This leads to a more compact representation of the program. Technically, we reduce LCR to a reachability problem on a finite automaton. The result is as follows.

**Proposition 9.** LCR *can be solved in time* <sup>O</sup>(4*<sup>C</sup>* · *<sup>L</sup>*<sup>4</sup> · *<sup>D</sup>* <sup>3</sup> · *<sup>C</sup>* <sup>2</sup>)*.*

The main observation is that keeping one set of states for all contributors suffices to represent a computation. Let S ⊆ Q<sup>C</sup> be the set of states reachable by the contributors in a given computation. By the *Copycat Lemma* [16], we can assume for each q ∈ S an arbitrary number of contributors that are currently in state q. This means that we do not have to distinguish between different contributor instances.

Formally, we reduce the search space to Q<sup>L</sup> × *D* × P(Q<sup>C</sup> ). Instead of storing explicit configurations, we store tuples (qL, a, S), where q<sup>L</sup> ∈ QL, a ∈ *D*, and S ⊆ Q<sup>C</sup> . Between such tuples, the transition relation is as follows. Transitions of the leader change the state and the memory as expected. The contributors also change the memory but saturate S instead of changing the state. Formally, if there is a transition from q ∈ S to q , we add q to S.

**Lemma 10.** *There is a* <sup>t</sup> <sup>∈</sup> <sup>N</sup> *so that* <sup>c</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup>*<sup>t</sup>* <sup>c</sup> *with* <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> *if and only if there is a run from* (q<sup>0</sup> <sup>L</sup>, a<sup>0</sup>, {q<sup>0</sup> <sup>C</sup> }) *to a state in* F<sup>L</sup> × *D* × P(Q<sup>C</sup> )*.*

The dominant factor in the complexity estimation of Proposition 9 is the time needed to construct the state space. It takes time <sup>O</sup>(4<sup>C</sup> · <sup>L</sup><sup>4</sup> · <sup>D</sup><sup>3</sup> · <sup>C</sup><sup>2</sup>). For the definition and the proof of Lemma 10, we refer to the full version [9].

**Lower Bound and Absence of a Polynomial Kernel.** We present two lower bounds for LCR. The first is based on ETH: We show that there is no 2<sup>o</sup>(C)-time algorithm for LCR unless ETH fails. This indicates that the above algorithm is asymptotically optimal. Technically, we give a reduction from n-variable 3-SAT to LCR such that the size of the contributor in the constructed instance is O(n). Then a 2<sup>o</sup>(C) -time algorithm for LCR yields a 2<sup>o</sup>(n) -time algorithm for 3-SAT, a contradiction to ETH.

With a similar reduction, one can cross-compose 3-SAT into LCR(C). This shows that the problem does not admit a polynomial kernel. The precise constructions and proofs can be found in the full version [9].

#### **Proposition 11**

*(a)* LCR *cannot be solved in time* 2o(*C*) *unless* ETH *fails. (b)* LCR(*C*) *does not admit a polynomial kernel unless* NP <sup>⊆</sup> coNP/poly*.*

#### **4 Bounded-Stage Reachability**

The *bounded-stage reachability problem* is a simultaneous reachability problem. It asks whether all threads of a program can reach an unsafe state when restricted to s-stage computations. These are computations where the write permission changes s times. The problem was first analyzed in [1] and shown to be NPcomplete for finite-state programs. We give matching upper and lower bounds in terms of fine-grained complexity and prove the absence of a polynomial kernel.

Let <sup>A</sup> = (*D*, a0,(Pi)<sup>i</sup>∈[1..t]) be a program. A *stage* is a computation in <sup>A</sup> where only one of the threads writes. The remaining threads are restricted to reading the memory. An s*-stage computation* is a computation that can be split into s parts, each of which forming a stage.

*Bounded-Stage Reachability* (BSR) **Input:** A program <sup>A</sup> = (*D*, a0,(Pi)<sup>i</sup>∈[1..t]), a set <sup>C</sup><sup>f</sup> <sup>⊆</sup> <sup>C</sup>, and <sup>s</sup> <sup>∈</sup> <sup>N</sup>. **Question:** Is there an <sup>s</sup>-stage computation <sup>c</sup><sup>0</sup> <sup>→</sup><sup>∗</sup> <sup>A</sup> <sup>c</sup> for some <sup>c</sup> <sup>∈</sup> <sup>C</sup><sup>f</sup> ?

We focus on a parameterization of BSR by P, the maximum number of states of a thread, and t, the number of threads. Let it be denoted by BSR(P, t). We prove that the parameterization is FPT and present a matching lower bound. The main result in this section is the absence of a polynomial kernel for BSR(P, t). The result is technically involved and reveals hardness of the problem.

Parameterizations of BSR involving D and s, the number of stages, are not interesting for fine-grained complexity theory. We can show that BSR is NP-hard even for constant D and s. This immediately rules out FPT algorithms in these parameters. For details, we refer to the full version of the paper [9].

**Upper Bound.** We show that BSR(P, t) is fixed-parameter tractable. The idea is to reduce to reachability on a product automaton. The automaton stores the configurations, the current writer, and counts up to the number of stages s. To this end, it has <sup>O</sup><sup>∗</sup>(P<sup>t</sup>) many states. Details can be found in the full version [9].

**Proposition 12.** BSR *can be solved in time* <sup>O</sup><sup>∗</sup>(*P*2*<sup>t</sup>*)*.*

**Lower Bound.** By a reduction from <sup>k</sup> <sup>×</sup> k Clique, we show that a 2<sup>o</sup>(t·log(P)) time algorithm for BSR would contradict ETH. The above algorithm is optimal.

**Proposition 13.** BSR *cannot be solved in time* 2<sup>o</sup>(*t*·log(*P*)) *unless* ETH *fails.*

The reduction maps an instance of k × k Clique to an equivalent instance (<sup>A</sup> = (*D*, a<sup>0</sup>(Pi)i∈[1..t]), C<sup>f</sup> , s) of BSR. Moreover, it keeps the parameters small. We have that <sup>P</sup> <sup>=</sup> <sup>O</sup>(k<sup>2</sup>) and <sup>t</sup> <sup>=</sup> <sup>O</sup>(k). As a consequence, a 2o(t·log(P))-time algorithm for BSR would yield an algorithm for k × k Clique running in time 2o(k·log(k2)) = 2o(k·log(k)). But this contradicts ETH.

*Proof (Idea).* For the reduction, let V = [1..k] × [1..k] be the vertices of G. We define *<sup>D</sup>* <sup>=</sup> <sup>V</sup> ∪ {a<sup>0</sup>} to be the domain of the memory. We want the threads to communicate on the vertices of G. For each row we introduce a reader thread P<sup>i</sup> that is responsible for storing a particular vertex of the row. We also add one writer, Pch, that is used to steer the communication between the Pi. Our program <sup>A</sup> is given by (*D*, a0,((Pi)<sup>i</sup>∈[1..k], Pch)).

Intuitively, the program proceeds in two phases. In the first phase, each P<sup>i</sup> non-deterministically chooses a vertex from the i-th row and stores it in its state space. This constitutes a clique candidate (1, j1),...,(k, jk) ∈ V . In the second phase, thread Pch starts to write a random vertex (1, j <sup>1</sup>) of the first row to the memory. The first thread P<sup>1</sup> reads (1, j <sup>1</sup>) from the memory and verifies that the read vertex is actually the one from the clique candidate. The computation in P<sup>1</sup> will deadlock if j 1 = j1. The threads P<sup>i</sup> with i = 1 also read (1, j <sup>1</sup>) from the memory. They have to check whether there is an edge between the stored vertex (i, ji) and (1, j <sup>1</sup>). If this fails in some Pi, the computation in that thread will also deadlock. After this procedure, the writer Pch guesses a vertex (2, j <sup>2</sup>) and writes it to the memory. Now the verification steps repeat. After k repetitions of the procedure, we can ensure that the guessed clique candidate is indeed a clique. Note that the whole communication takes one stage. Details are given in [9]. 

**Absence of a Polynomial Kernel.** We show that BSR(P, t) does not admit a polynomial kernel. To this end, we cross-compose 3-SAT into BSR(P, t).

# **Theorem 14.** BSR(*P*, *<sup>t</sup>*) *does not admit a poly. kernel unless* NP <sup>⊆</sup> coNP/poly*.*

In the present setting, coming up with a cross-composition is non-trivial. Both parameters, P and t, are not allowed to depend polynomially on the number I of given 3-SAT-instances. Hence, we cannot construct an NFA that distinguishes the I instances by branching into I different directions. This would cause a polynomial dependence of P on I. Furthermore, it is not possible to construct an NFA for each instance as this would cause such a dependence of t on I. To circumvent the problems, some deeper understanding of the model is needed.

*Proof (Idea).* Let ϕ1,...,ϕ<sup>I</sup> be given 3-SAT-instances, where each two are equivalent under R, the polynomial equivalence relation of Theorem 8. Then each ϕ has <sup>m</sup> clauses and <sup>n</sup> variables {x1,...,x<sup>n</sup>}. We assume <sup>ϕ</sup> <sup>=</sup> <sup>C</sup> <sup>1</sup> ∧···∧ <sup>C</sup> <sup>m</sup>.

In the program that we construct, the communication is based on 4-tuples of the form (, j, i, v). Intuitively, such a tuple transports the following information: The j-th clause in instance ϕ, C <sup>j</sup> , can be satisfied by variable x<sup>i</sup> with valuation <sup>v</sup>. Hence, our data domain is *<sup>D</sup>* = ([1..I] <sup>×</sup> [1..m] <sup>×</sup> [1..n] × {0, <sup>1</sup>}) ∪ {a<sup>0</sup>}.

For choosing and storing a valuation of the xi, we introduce so-called variable threads P<sup>x</sup>1 ,...,Px*<sup>n</sup>* . In the beginning, each <sup>P</sup>x*<sup>i</sup>* non-deterministically chooses a valuation for x<sup>i</sup> and stores it in its states.

We further introduce a writer Pw. During a computation, this thread guesses exactly m tuples (1, 1, i1, v1),...,(m, m, im, vm) in order to satisfy m clauses of potentially different instances. Each (<sup>j</sup> , j, i<sup>j</sup> , v<sup>j</sup> ) is written to the memory by Pw. All variable threads then start to read the tuple. If Px*<sup>i</sup>* with i = i<sup>j</sup> reads it, then the thread will just move one state further since the suggested tuple does not affect the variable xi. If P<sup>x</sup>*<sup>i</sup>* with i = i<sup>j</sup> reads the tuple, the thread will only continue its computation if v<sup>j</sup> coincides with the value that P<sup>x</sup>*<sup>i</sup>* guessed for x<sup>i</sup> and, moreover, <sup>x</sup><sup>i</sup> with value <sup>v</sup><sup>j</sup> satisfies clause <sup>C</sup>*<sup>j</sup>* <sup>j</sup> .

Now suppose the writer did exactly m steps while each variable thread did exactly m + 1 steps. This proves the satisfiability of m clauses by the chosen valuation. But these clauses can be part of different instances: It is not ensured that the clauses were chosen from one formula ϕ. The major difficulty of the cross-composition lies in how to ensure exactly this.

We overcome the difficulty by introducing so-called bit checkers Pb, where <sup>b</sup> <sup>∈</sup> [1.. log(I)]. Each <sup>P</sup><sup>b</sup> is responsible for the <sup>b</sup>-th bit of bin(), the binary representation of , where ϕ is the instance we want to satisfy. When P<sup>w</sup> writes a tuple (1, 1, i1, v1) for the first time, each P<sup>b</sup> reads it and stores either 0 or 1, according to the b-th bit of bin(1). After P<sup>w</sup> has written a second tuple (2, 2, i2, v2), the bit checker P<sup>b</sup> tests whether the b-th bit of bin(1) and bin(2) coincide, otherwise it will deadlock. This will be repeated any time P<sup>w</sup> writes a new tuple to the memory.

Assume, the computation does not deadlock in any of the Pb. Then we can ensure that the <sup>b</sup>-th bit of bin(<sup>j</sup> ) with <sup>j</sup> <sup>∈</sup> [1..m] never changed during the computation. This means that bin(1) = ··· <sup>=</sup> bin(m). Hence, the writer <sup>P</sup><sup>w</sup> has chosen clauses of just one instance ϕ and with the current valuation, it is possible to satisfy the formula. Since the parameters are bounded, <sup>P</sup> ∈ O(m) and <sup>t</sup> ∈ O(<sup>n</sup> + log(I)), the construction constitutes a proper cross-composition. For a formal construction and proof, we refer to the full version [9]. 

### **5 Conclusion**

We studied several parameterizations of LCR and BSR, two safety verification problems for shared-memory concurrent programs. For LCR, we identified the parameters D, L, and C. Our first algorithm showed that LCR(D, L) is FPT. Then, we used a modification of the algorithm to obtain a verification procedure valuable for practical instances. The main insight was that due to a factorization along strongly connected components, the impact of L can be reduced to a polynomial factor in the time complexity. We also proved the absence of a polynomial kernel for LCR(D, L) and presented a lower bound which is a root factor away from the upper bound. For LCR(C) we gave a tight upper and lower bound.

The parameters of interest for BSR are P and t. We have shown that BSR(P, t) is FPT and gave a matching lower bound. The main contribution was to prove it unlikely that a polynomial kernel exists for BSR(P, t). The proof relies on a technically involved cross-composition that avoids a polynomial dependence of the parameters on the number of given 3-SAT-instances.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Parameterized Verification of Synchronization in Constrained Reconfigurable Broadcast Networks**

A. R. Balasubramanian<sup>1</sup>, Nathalie Bertrand<sup>2</sup>, and Nicolas Markey2(B)

<sup>1</sup> Chennai Mathematical Institute, Chennai, India <sup>2</sup> Univ. Rennes, Inria, CNRS, IRISA, Rennes, France nicolas.markey@irisa.fr

**Abstract.** Reconfigurable broadcast networks provide a convenient formalism for modelling and reasoning about networks of mobile agents broadcasting messages to other agents following some (evolving) communication topology. The parameterized verification of such models aims at checking whether a given property holds irrespective of the initial configuration (number of agents, initial states and initial communication topology). We focus here on the synchronization property, asking whether all agents converge to a set of target states after some execution. This problem is known to be decidable in polynomial time when no constraints are imposed on the evolution of the communication topology (while it is undecidable for static broadcast networks).

In this paper we investigate how various constraints on reconfigurations affect the decidability and complexity of the synchronization problem. In particular, we show that when bounding the number of reconfigured links between two communications steps by a constant, synchronization becomes undecidable; on the other hand, synchronization remains decidable in PTIME when the bound grows with the number of agents.

#### **1 Introduction**

There are numerous application domains for networks formed of an arbitrary number of anonymous agents executing the same code: prominent examples are distributed algorithms, communication protocols, cache-coherence protocols, and biological systems such as populations of cells or individuals, etc. The automated verification of such systems is challenging [3,8,12,15]: its aim is to validate at once all instances of the model, independently of the (parameterized) number of agents. Such a problem can be phrased in terms of infinite-state-system verification. Exploiting symmetries may lead to efficient algorithms for the verification of relevant properties [7].

Different means of interactions between agents can be considered in such networks, depending on the application domain. Typical examples are shared

This work has been supported by the Indo-French research unit UMI Relax, and by ERC project EQualIS (308087).

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 38–54, 2018. https://doi.org/10.1007/978-3-319-89963-3\_3

variables [4,10,13], *rendez-vous* [12], and broadcast communications [6,9]. In this paper, we target ad hoc networks [6], in which the agents can broadcast messages simultaneously to all their neighbours, *i.e.*, to all the agents that are within their radio range. The number of agents and the communication topology are fixed once and for all at the beginning of the execution. Parameterized verification of broadcast networks checks if a specification is met independently of the number of agents and communication topology. It is usually simpler to reason about the dual problem of the existence of an initial configuration (consisting of a network size, an initial state for each agent, and a communication topology) from which some execution violates the given specification.

Several types of specifications have been considered in the literature. We focus here on coverability and synchronization: *does there exist an initial configuration from which some agent (resp. all agents at the same time) may reach a particular set of target states*. Both problems are undecidable; decidability of coverability can be regained by bounding the length of simple paths in the communication topology [6].

In the case of mobile ad hoc networks (MANETs), agents are mobile, so that the communication links (and thus the neighbourhood of each agent) may evolve over time. To reflect the mobility of agents, Delzanno *et al.* studied *reconfigurable* broadcast networks [5,6]. In such networks, the communication topology can change arbitrarily at any time. Perhaps surprisingly, this modification not only allows for a more faithful modelling of MANETs, but it also leads to decidability of both the coverability and the synchronization problems [6]. A probabilistic extension of reconfigurable broadcast networks has been studied in [1,2] to model randomized protocols.

A drawback of the semantics of reconfigurable broadcast networks is that they allow arbitrary changes at each reconfiguration. Such arbitrary reconfigurations may not be realistic, especially in settings where communications are frequent enough, and mobility is slow and not chaotic. In this paper, we limit the impact of reconfigurations in several ways, and study how those limitations affect the decidability and complexity of parameterized verification of synchronization.

More specifically, we restrict reconfigurations by limiting the number of changes in the communication graph, either by considering *global* constraints (on the total number of edges being modified), or by considering *local* constraints (on the number of updates affecting each individual node). We prove that synchronization is decidable when imposing constant local constraints, as well as when imposing global constraints depending (as a divergent function) on the number of agents. On the other hand, imposing a constant global bound makes synchronization undecidable. We recover decidability by bounding the maximal degree of each node by 1.

#### **2 Broadcast Networks with Constrained Reconfiguration**

In this section, we first define reconfigurable broadcast networks; we then introduce several constraints on reconfigurations along executions, and investigate how they compare one to another and with unconstrained reconfigurations.

**Fig. 1.** Example of a broadcast protocol

**Fig. 2.** Sample execution under reconfigurable semantics, synchronizing to {*q*4*, q*6*, q*8} (*B*-transitions are communications steps, *R* are reconfiguration steps.)

#### **2.1 Reconfigurable Broadcast Networks**

**Definition 1.** A broadcast protocol *is a tuple* <sup>P</sup> = (Q, I, Σ, Δ) *where* <sup>Q</sup> *is a finite set of control states;* <sup>I</sup> <sup>∈</sup> <sup>Q</sup> *is the set of initial control states;* <sup>Σ</sup> *is a finite alphabet; and* <sup>Δ</sup> <sup>⊆</sup> (<sup>Q</sup> × {!!*a*, ??*<sup>a</sup>* <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>} × <sup>Q</sup>) *is the transition relation.*

A (reconfigurable) broadcast network is a system made of several copies of a single broadcast protocol P. Configurations of such a network are undirected graphs whose each node is labelled with a state of P. Transitions between configurations can either be reconfigurations of the communication topology (*i.e.*, changes in the edges of the graph), or a communication via broadcast of a message (*i.e.*, changes in the labelling of the graph). Figures 1 and 2 respectively display an example of a broadcast protocol and of an execution of a network made of three copies of that protocol.

Formally, we first define undirected labelled graphs. Given a set L of labels, an <sup>L</sup>*-graph* is an undirected graph <sup>G</sup> = (N, <sup>E</sup>, <sup>L</sup>) where <sup>N</sup> is a finite set of nodes; <sup>E</sup> ⊆ P2(N)<sup>1</sup> (notice in particular that such a graph has no self-loops); finally, L: N → L is the labelling function. We let G<sup>L</sup> denote the (infinite) set of Llabelled graphs. Given a graph <sup>G</sup> ∈ GL, we write <sup>n</sup> <sup>∼</sup> <sup>n</sup> whenever {n, <sup>n</sup> } ∈ E and we let NeighG(n) = {n | n ∼ n } be the neighbourhood of n, *i.e.* the set of nodes adjacent to <sup>n</sup>. For a label , we denote by <sup>|</sup>G| the number of nodes in G labelled by . Finally L(G) denotes the set of labels appearing in nodes of G.

The semantics of a reconfigurable broadcast network based on broadcast protocol P is an infinite-state transition system T (P). The configurations of T (P) are <sup>Q</sup>-labelled graphs. Intuitively, each node of such a graph runs protocol <sup>P</sup>,

<sup>1</sup> For a finite set *<sup>S</sup>* and 1 <sup>≤</sup> *<sup>k</sup>* ≤ |*S*|, we let <sup>P</sup>*k*(*S*) = {*<sup>T</sup>* <sup>⊆</sup> *<sup>S</sup>* | |*T*<sup>|</sup> <sup>=</sup> *<sup>k</sup>*}.

and may send/receive messages to/from its neighbours. A configuration (N, E, L) is said *initial* if <sup>L</sup>(N) <sup>⊆</sup> <sup>I</sup>. From a configuration <sup>G</sup> = (N, <sup>E</sup>, <sup>L</sup>), two types of steps are possible. More precisely, there is a step from (N, E, L) to (N , E , L ) if one of the following two conditions holds:

**(reconfiguration step)** N = N and L = L: a reconfiguration step does not change the set of nodes and their labels, but may change the edges arbitrarily; **(communication step)** <sup>N</sup> <sup>=</sup> <sup>N</sup>, <sup>E</sup> <sup>=</sup> <sup>E</sup>, and there exists <sup>n</sup> <sup>∈</sup> <sup>N</sup> and <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> such that (L(n), !!*a*, L (n)) <sup>∈</sup> <sup>Δ</sup>, and for every <sup>n</sup> , if n ∈ NeighG(n), then (L(n ), ??*a*, L (n )) <sup>∈</sup> <sup>Δ</sup>, otherwise <sup>L</sup> (n ) = L(n ): a communication step reflects how nodes evolve when one of them broadcasts a message to its neighbours.

An *execution* of the reconfigurable broadcast network is a sequence ρ = (Gi)<sup>0</sup>≤i≤<sup>r</sup> of configurations such that for any i<r, there is a step from <sup>G</sup><sup>i</sup> to Gi+1 and ρ strictly alternates communication and reconfiguration steps (the latter possibly being trivial). An execution is *initial* if it starts from an initial configuration.

An important ingredient that we heavily use in the sequel is *juxtaposition* of configurations and *shuffling* of executions. The juxtaposition of two configurations G = (N, E, L) and G = (N , E , L ) is the configuration G ⊕ G = (N N , <sup>E</sup> <sup>E</sup> , <sup>L</sup>⊕), in which <sup>L</sup><sup>⊕</sup> extends both <sup>L</sup> and <sup>L</sup> : L⊕(n) = L(n) if n ∈ N and L⊕(n) = L (n) if n ∈ N . We write G<sup>2</sup> for the juxtaposition of G with itself, and, inductively, G<sup>N</sup> for the juxtaposition of G<sup>N</sup>−<sup>1</sup> with G. A shuffle of two executions <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>r</sup> and <sup>ρ</sup> = (G <sup>j</sup> )<sup>0</sup>≤j≤r is an execution <sup>ρ</sup><sup>⊕</sup> from <sup>G</sup><sup>0</sup> <sup>⊕</sup> <sup>G</sup> 0 to G<sup>r</sup> ⊕ G r obtained by interleaving ρ and ρ . Note that a reconfiguration step in <sup>ρ</sup><sup>⊕</sup> may be composed of reconfigurations from both <sup>ρ</sup> and <sup>ρ</sup> . We write <sup>ρ</sup> <sup>⊕</sup> <sup>ρ</sup> for the set of shuffle executions obtained from ρ and ρ .

Natural decision problems for reconfigurable broadcast networks include checking whether some node may reach a target state, or whether all nodes may synchronize to a set of target states. More precisely, given a broadcast protocol <sup>P</sup> and a subset <sup>F</sup> <sup>⊆</sup> <sup>Q</sup>, the *coverability problem* asks whether there exists an initial execution <sup>ρ</sup> that visits a configuration <sup>G</sup> with <sup>L</sup>(G) <sup>∩</sup> <sup>F</sup> <sup>=</sup> <sup>∅</sup>, and the *synchronization problem* asks whether there exists an initial execution ρ that visits a configuration <sup>G</sup> with <sup>L</sup>(G) <sup>⊆</sup> <sup>F</sup>. For unconstrained reconfigurations, we have:

**Theorem 2** ([5,6,11]). *The coverability and synchronization problems are decidable in* PTIME *for reconfigurable broadcast protocols.*

*Remark 1.* The synchronization problem was proven decidable in [6], and PTIME membership was given in [11, p. 41]. The algorithm consists in computing the set of states of P that are both reachable (*i.e.*, coverable) from an initial configuration and co-reachable from a target configuration. This can be performed by applying iteratively the algorithm of [5] for computing the set of reachable states (with reversed transitions for computing co-reachable states).

*Example 1.* Consider the broadcast protocol of Fig. <sup>1</sup> with <sup>I</sup> <sup>=</sup> {q0}. From each state, unspecified message receptions lead to an (omitted) sink state; this way, each broadcast message triggers a transition in all the neighbouring copies.

For that broadcast protocol, one easily sees that it is possible to synchronize to the set {q4, q6, q8}. Moreover, three copies are needed and sufficient for that objective, as witnessed by the execution of Fig. 2. The initial configuration has three copies and two edges. If the central node broadcasts a, the other two nodes receive, one proceeding to q<sup>5</sup> and the other to q7. Then, we assume the communication topology is emptied before the same node broadcasts b, moving to q2. Finally the node in q<sup>5</sup> connects to the one in q<sup>2</sup> to communicate on c and then disconnects, followed by a similar communication on d initiated by the node in q7.

#### **2.2 Natural Constraints for Reconfiguration**

Allowing arbitrary changes in the network topology may look unrealistic. In order to address this issue, we introduce several ways of bounding the number of reconfigurations after each communication step. For this, we consider the following natural pseudometric between graphs, which for simplicity we call *distance*.

**Definition 3.** *Let* G = (N, E, L) *and* G = (N , E , L ) *be two* L*-labelled graphs. The distance between* G *and* G *is defined as*

$$\mathsf{dist}(\mathsf{G}, \mathsf{G}') = |E \cup E'\rangle\langle E \cap E')|$$

*when* N = N *and* L = L *, and* dist(G,G )=0 *otherwise.*

Setting the "distance" to 0 for two graphs that do not agree on the set of nodes or on the labelling function might seem strange at first. This choice is motivated by the definition of constraints on executions (see below) and of the number of reconfigurations along an execution (see Sect. 2.3). Other distances may be of interest in this context; in particular, for a fixed node <sup>n</sup> <sup>∈</sup> <sup>N</sup>, we let distn(G,G ) be the number of edges involving node n in the symmetric difference of E and E (still assuming N = N and L = L ).

*Constant Number of Reconfigurations per Step.* A first natural constraint on reconfiguration consists in bounding the number of changes in a reconfiguration step by a constant number. Recall that along executions, communication and reconfiguration steps strictly alternate.

**Definition 4.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. An execution* <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>r</sup> *of a reconfigurable broadcast network is* <sup>k</sup>-constrained *if for every index* i<r*, it holds* dist(Gi,Gi+1) <sup>≤</sup> <sup>k</sup>*.*

*Example 1 (Contd).* For the synchronization problem, bounding the number of reconfigurations makes a difference. The sample execution from Fig. 2 is not 1 constrained, and actually no 1-constrained executions of that broadcast protocol can synchronize to {q4, q5, q6}. This can be shown by exhibiting and proving an invariant on the reachable configurations (see Lemma 10).

*Beyond Constant Number of Reconfigurations per Step.* Bounding the number of reconfigurations per step by a constant is somewhat restrictive, especially when this constant does not depend on the size of the network. We introduce other kinds of constraints here, for instance by bounding the number of reconfigurations by k *on average* along the execution, or by having a bound that depends on the number of nodes executing the protocol.

For a finite execution <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>r</sup> of a reconfigurable broadcast network, we write nb comm(ρ) for the number of communication steps along ρ (notice that r/2 ≤ nb comm(ρ) ≤ r/2 since we require strict alternation between reconfiguration and communication steps), and nb reconf(ρ) for the total number of edge reconfigurations in ρ, that is nb reconf(ρ) = r−2 <sup>i</sup>=0 dist(Gi,Gi+1).

**Definition 5.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. An execution* <sup>ρ</sup> *of a reconfigurable broadcast network is said* k-balanced *if it starts and ends with a communication step, and satisfies* nb reconf(ρ) <sup>≤</sup> <sup>k</sup> · (nb comm(ρ) <sup>−</sup> 1)*.*

This indeed captures our intuition that along a k-balanced execution, reconfigurations *on average* update less than k links.

Finally, we will also consider two relevant ways to constrain reconfigurations depending on the size of the network: first locally, bounding the number of reconfigurations *per node* by a constant; second globally, bounding the total number of reconfigurations by a function of the number of nodes.

We first bound reconfigurations locally.

**Definition 6.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup>*. An execution* <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>r</sup> *of a reconfigurable broadcast network is* k-locally-constrained*, if, for every node* n *and for every index* i<r*,* distn(Gi,Gi+1) <sup>≤</sup> <sup>k</sup>*.*

One may also bound the number of reconfigurations globally using bounding functions, that depend on the number of nodes in the network:

**Definition 7.** *Let* <sup>f</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> *be a function. An execution* <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>r</sup> *of a reconfigurable broadcast network is* f-constrained*, if, writing* n *for the number of nodes in* <sup>G</sup>0*, it holds* dist(Gi,Gi+1) <sup>≤</sup> <sup>f</sup>(n) *for any* i<r*.*

Notice that if <sup>f</sup> is the constant function <sup>n</sup> <sup>∈</sup> <sup>N</sup> → <sup>k</sup> for some <sup>k</sup> <sup>∈</sup> <sup>N</sup>, <sup>f</sup>-constrained executions coincide with k-constrained ones, so that our terminology is nonambiguous. Other natural bounding functions are non-decreasing and *diverging*. This way, the number of possible reconfigurations tends to infinity when the network size grows, *i.e.* <sup>∀</sup>n. <sup>∃</sup>k. f(k) <sup>≥</sup> <sup>n</sup>.

*Remark 2.* Coverability under constrained reconfigurations is easily observed to be equivalent to coverability with unconstrained reconfigurations: from an unconstrained execution, we can simply juxtapose extra copies of the protocol, which would perform extra communication steps so as to satisfy the constraint. When dealing with synchronization, this technique does not work since the extra copies would also have to synchronize to a target state. As a consequence, we only focus on synchronization in the rest of this paper.

### **2.3 Classification of Constraints**

In this section, we compare our restrictions. We prove that, for the synchronization problem, k-locally-constrained and f-constrained reconfigurations, for diverging functions f, are equivalent to unconstrained reconfigurations. On the other hand, we prove that k-constrained reconfigurations are equivalent to kbalanced reconfigurations, and do not coincide with unconstrained reconfigurations.

### *Equivalence Between Unconstrained and Locally-Constrained Reconfigurations.*

**Lemma 8.** *Let* <sup>P</sup> *be a broadcast protocol,* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *be a target set, and* <sup>f</sup> *be a non-decreasing diverging function. If the reconfigurable broadcast network defined by* <sup>P</sup> *has an initial execution synchronizing in* <sup>F</sup>*, then it has an* <sup>f</sup>*-constrained initial execution synchronizing in* F*.*

*Proof.* We first prove the lemma for the identity function Id. More precisely, we prove that for an execution <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>n</sup>, of the reconfigurable broadcast network, there exists a Id-constrained execution ρ = (G <sup>j</sup> )<sup>0</sup>≤j≤<sup>m</sup>, whose last transition (if any) is a communication step, and such that for any control state q, |Gn|<sup>q</sup> = |G <sup>m</sup>|q. We reason by induction on the length of the execution. The claim is obvious for n = 0. Suppose the property is true for all naturals less than or equal to some <sup>n</sup> <sup>∈</sup> <sup>N</sup>, and consider an execution <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤n+1. The induction hypothesis ensures that there is an f-constrained execution ρ = (G <sup>j</sup> )<sup>0</sup>≤j≤<sup>m</sup> with |Gn|<sup>q</sup> = |G <sup>m</sup>|<sup>q</sup> for all <sup>q</sup>. If the last transition from <sup>G</sup><sup>n</sup> to <sup>G</sup>n+1 in <sup>ρ</sup> is a reconfiguration step, then the execution ρ witnesses our claim. Otherwise, the transition from G<sup>n</sup> to Gn+1 is a communication step, involving a broadcasting node n of G<sup>n</sup> labelled with q, and receiving nodes n<sup>1</sup> to n<sup>r</sup> of Gn, respectively labelled with q<sup>1</sup> to qr. By hypothesis, G <sup>m</sup> also contains a node n labelled with q and r nodes n <sup>1</sup> to n <sup>r</sup>, labelled with q<sup>1</sup> to qr. We then add two steps after G <sup>m</sup> in ρ : we first reconfigure the graph so that NeighG- *<sup>m</sup>*+1 (n ) = {n <sup>i</sup> <sup>|</sup> <sup>0</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>r</sup>}, which requires changing at most |G0| − 1 links, and then perform the same broadcast/ receive transitions as between G<sup>n</sup> and Gn+1.

For the general case of the lemma, suppose f is a non-decreasing diverging function. Further, let <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>n</sup> be an Id-constrained execution, and pick <sup>k</sup> such that <sup>f</sup>(<sup>k</sup> · |G0|) ≥ |G0|. Consider the initial configuration <sup>G</sup><sup>k</sup> <sup>0</sup>, made of k copies of G0, and the execution, denoted ρ<sup>k</sup>, made of k copies of ρ running independently from each of the k copies of G<sup>0</sup> in G<sup>k</sup> <sup>0</sup>. Each reconfiguration step involves at most <sup>|</sup>G0<sup>|</sup> links, so that <sup>ρ</sup><sup>k</sup> is <sup>f</sup>-constrained.

**Lemma 9.** *Let* <sup>P</sup> *be a broadcast protocol with* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *a target set. If the reconfigurable broadcast network defined by* P *has an initial execution synchronizing in* F*, then it has a* 1*-locally-constrained initial execution synchronizing in* F*.*

*k-Constrained and k-Balanced Reconfigurations.* We prove here that kconstrained and k-balanced reconfigurations are equivalent w.r.t. synchronization, and that they are strictly stronger than our other restrictions. We begin with the latter:

**Lemma 10.** *There exists a broadcast protocol* <sup>P</sup> *and a set* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *of target states for which synchronization is possible from some initial configuration when unconstrained reconfigurations are allowed, and impossible, from every initial configuration when only* 1*-constrained reconfigurations are allowed.*

A protocol with this property is the one from Example 1, for which we exhibited a 2-constrained synchronizing execution. It can be proved that no 1 constrained synchronizing executions exist for this protocol, whatever the number of copies. We now prove the main result of this section:

**Theorem 11.** *Let* <sup>P</sup> *be a broadcast protocol and* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup>*. There exists a* <sup>k</sup>*constrained initial execution synchronizing in* F *if, and only if, there exists a* k*-balanced initial execution synchronizing in* F*.*

*Proof.* The left-to-right implication is simple: if there is a k-constrained initial execution synchronizing in F, w.l.o.g. we can assume that this execution starts and ends with a communication step; moreover, each reconfiguration step contains at most k edge reconfigurations, so that the witness execution is k-balanced.

Let <sup>ρ</sup> = (Gi)<sup>0</sup>≤i≤<sup>n</sup> be a <sup>k</sup>-balanced execution synchronizing in <sup>F</sup> and starting and ending with communication steps (hence n is odd). We define the potential (pi)<sup>0</sup>≤i≤<sup>n</sup> of <sup>ρ</sup> as the sequence of <sup>n</sup> + 1 integers obtained as follows:

– p<sup>0</sup> = 0;

$$1 - p\_{2i+1} = p\_{2i} + k \text{ for } i \le (n-1)/2 \text{ (this corresponds to a communication step)};$$

– <sup>p</sup>2i+2 <sup>=</sup> <sup>p</sup>2i+1 <sup>−</sup>dist(G2i+1,G2i+2) for <sup>i</sup> <sup>≤</sup> (n−1)/2−1 (reconfiguration step).

That <sup>ρ</sup> is <sup>k</sup>-balanced translates as <sup>p</sup><sup>n</sup>−<sup>1</sup> <sup>≥</sup> 0: the sequence (pi)<sup>0</sup>≤i≤<sup>n</sup> stores the value of <sup>k</sup> · nb comm(ρ≤<sup>i</sup>) <sup>−</sup> nb reconf(ρ≤<sup>i</sup>) for each prefix <sup>ρ</sup>≤<sup>i</sup> of <sup>ρ</sup>; being <sup>k</sup>balanced means that <sup>p</sup><sup>n</sup> <sup>≥</sup> <sup>k</sup>, and since the last step is a communication step, this in turn means <sup>p</sup><sup>n</sup>−<sup>1</sup> <sup>≥</sup> 0. On the other hand, in order to be <sup>k</sup>-constrained, it is necessary (but not sufficient) to have <sup>p</sup><sup>i</sup> <sup>≥</sup> 0 for all 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>.

We build a k-constrained execution by shuffling several copies of ρ. We actually begin with the case where k = 1, and then extend the proof to any k. We first compute how many copies we need. For this, we split ρ into several phases, based on the potential (pi)<sup>0</sup>≤i≤<sup>n</sup> defined above. A phase is a maximal segment of <sup>ρ</sup>≤n−<sup>1</sup> (the prefix of ρ obtained by dropping the last (communication) step) along which the sign of the potential is constant (or zero): graphs G<sup>i</sup> and G<sup>j</sup> are in the same phase if, and only if, for all <sup>i</sup> <sup>≤</sup> <sup>l</sup> <sup>≤</sup> <sup>l</sup> <sup>≤</sup> <sup>j</sup>, it holds <sup>p</sup><sup>l</sup> ·p<sup>l</sup>- <sup>≥</sup> 0. We decompose <sup>ρ</sup> as the concatenation of phases (ρ<sup>j</sup> )<sup>0</sup>≤j≤<sup>m</sup>; since <sup>ρ</sup> is <sup>k</sup>-balanced, <sup>m</sup> is even, and <sup>ρ</sup>0, ρm, and all even-numbered phases are *non-negative* phases (*i.e.*, the potential is non-negative along those executions), while all odd-numbered executions are *non-positive* phases. Also, all phases end with potential zero, except possibly for ρm. See Fig. 3 for an example of a decomposition into phases.

**Lemma 12.** *For any phase* <sup>ρ</sup><sup>i</sup> <sup>=</sup> <sup>G</sup><sup>b</sup>*<sup>i</sup>* ··· <sup>G</sup><sup>e</sup>*<sup>i</sup> of a* <sup>1</sup>*-balanced execution* <sup>ρ</sup> <sup>=</sup> <sup>G</sup><sup>0</sup> ··· <sup>G</sup>n*, there exists* <sup>κ</sup><sup>i</sup> <sup>≤</sup> (e<sup>i</sup> <sup>−</sup> <sup>b</sup>i)/<sup>2</sup> *such that for any* <sup>N</sup> <sup>∈</sup> <sup>N</sup>*, there exists a* 1*-constrained execution from* G<sup>κ</sup>*<sup>i</sup>* <sup>0</sup> <sup>⊕</sup> <sup>G</sup><sup>N</sup> <sup>b</sup>*<sup>i</sup> to* <sup>G</sup><sup>κ</sup>*<sup>i</sup>* <sup>1</sup> <sup>⊕</sup> <sup>G</sup><sup>N</sup> e*i .*

*Proof.* We handle non-negative and non-positive phases separately. In a nonnegative phase, we name *repeated reconfiguration step* any reconfiguration step that immediately follows another (possibly from the previous phase) reconfiguration step (so that if there are four consecutive reconfiguration steps, the last three are said repeated); similarly, we name *repeated communication step* any communication step that is immediately followed (possibly in the next phase) by another communication step (hence the first three of fours consecutive communication steps are repeated).

We first claim that any non-negative phase contains at least as many repeated communication steps as it contains repeated reconfiguration steps. Indeed, any non-repeated communication step in a non-negative phase is necessarily followed by a non-repeated reconfiguration step, and conversely, and non-negative phases have at least as many communication steps as they have reconfiguration steps.

As a consequence, we can number all repeated reconfiguration steps from 1 (earliest) to κ<sup>i</sup> (latest), for some κi, and similarly for repeated communication steps. Clearly enough, in a non-negative phase, for any 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>κ</sup>i, the repeated communication step numbered j occurs before the repeated reconfiguration step carrying the same number.

We now build our 1-constrained execution from G<sup>κ</sup>*<sup>i</sup>* <sup>0</sup> <sup>⊕</sup> <sup>G</sup><sup>N</sup> <sup>b</sup>*<sup>i</sup>* to <sup>G</sup><sup>κ</sup>*<sup>i</sup>* <sup>1</sup> <sup>⊕</sup> <sup>G</sup><sup>N</sup> e*i* . We begin with a first part, where only the components starting from G<sup>b</sup>*<sup>i</sup>* move:


Notice that the number of copies involved in this process is arbitrary. The process lasts as long as some copies may advance within phase ρi. Hence, when the process stops, all copies of the original system either have reached the end of ρi, or are stopped before a repeated reconfiguration step. For the copies in the latter situation, we use the copies starting from G0. It remains to prove that having κ<sup>i</sup> such copies is enough to make all processes reach the end of ρi.

For this, we first assume that the potential associated with ρ<sup>i</sup> ends with value zero. This must be the case of all phases except the last one, which we handle after the general case. We first notice that in the execution we are currently building, any repeated communication step performed by any (but the

**Fig. 3.** Phases of a 1-balanced execution, and correspondence between repeated communication steps (loosely dotted blue steps) and repeated reconfiguration steps (densely dotted red steps) (Color figure online)

very first) copy that started from G<sup>b</sup>*<sup>i</sup>* is always followed by a repeated reconfiguration step. Similarly, non-repeated communication steps of any copy is followed by a non-repeated broadcast step of the same copy. As a consequence, the potential associated with the global execution we are currently building never exceeds the total number of repeated communication steps of performed by the first copy; hence it is bounded by κi, whatever the number N of copies involved. As a consequence, at most κ<sup>i</sup> communication steps are sufficient in order to advance all copies that started from G<sup>b</sup>*<sup>i</sup>* to the end of ρi.

Finally, the case of the last phase ρ<sup>m</sup> (possibly ending with positive potential) is easily handled, since it has more communication steps than reconfiguration steps.

The proof for non-positive phases is similar.

Pick a 1-balanced execution <sup>ρ</sup> <sup>=</sup> <sup>G</sup><sup>0</sup> ··· <sup>G</sup>n, and decompose it into phases <sup>ρ</sup><sup>1</sup> ··· <sup>ρ</sup>m. For each phase <sup>ρ</sup>i, we write <sup>κ</sup><sup>i</sup> for the total number of repeated reconfiguration steps, and we let κ = - <sup>1</sup>≤i≤<sup>m</sup> <sup>κ</sup><sup>i</sup> for the total number of repeated reconfiguration steps along <sup>ρ</sup>. Notice that <sup>κ</sup> <sup>≤</sup> n/2.

**Lemma 13.** *For every* <sup>1</sup>*-balanced execution* <sup>ρ</sup> <sup>=</sup> <sup>G</sup><sup>0</sup> ··· <sup>G</sup>n*, and for every* <sup>N</sup> <sup>∈</sup> <sup>N</sup>*, there exists a* 1*-constrained execution from* G<sup>N</sup> <sup>1</sup> <sup>⊕</sup> <sup>G</sup>κN <sup>e</sup>*<sup>m</sup> to* G<sup>N</sup>+κN <sup>n</sup> *.*

Combining the above two lemmas, we obtain the following proposition, which refines the statement of the Theorem 11:

**Proposition 14.** *For every* <sup>1</sup>*-balanced execution* <sup>ρ</sup> <sup>=</sup> <sup>G</sup><sup>0</sup> ··· <sup>G</sup><sup>n</sup> *and every* <sup>N</sup> <sup>≥</sup> κ<sup>2</sup> + κ*, there exists a* 1*-constrained execution from* G<sup>N</sup> <sup>0</sup> *to* G<sup>N</sup> n *.*

We finally extend this result to k > 1. In this case, splitting ρ into phases is not as convenient as when k = 1: indeed, a non-positive phase might not end with potential zero (because communication steps make the potential jump by k units). Lemma 12 would not hold in this case.

We circumvent this problem by first shuffling k copies of ρ in such a way that reconfigurations can be gathered into groups of size exactly k. This way, we can indeed split the resulting execution into non-negative and non-positive phases, always considering reconfigurations of size exactly k; we can then apply the techniques above in order to build a synchronizing k-constrained execution. This completes our proof.

### **3 Parameterized Synchronization Under Reconfiguration Constraints**

### **3.1 Undecidability for** *k***-Constrained Reconfiguration**

Although synchronization is decidable in PTIME [6,11] for reconfigurable broadcast networks, the problem becomes undecidable when reconfigurations are kconstrained.

**Theorem 15.** *The synchronization problem is undecidable for reconfigurable broadcast networks under* k*-constrained reconfigurations.*

*Proof.* We prove this undecidability result for 1-constrained reconfigurations, by giving a reduction from the halting problem for Minsky machines [14]. We begin with some intuition. The state space of our protocol has two types of states:


Incrementations and decrementations can then be performed by creating a link with a node in zero<sup>j</sup> and sending this node to one<sup>j</sup> , or sending a one<sup>j</sup> -node to zero<sup>j</sup> and removing the link.

In order to implement this, we have to take care of the facts that we may have several control nodes in our network, that we may have links between two control nodes or between two counter nodes, or that links between control nodes and counter nodes may appear or disappear at random. Intuitively, those problems will be handled as follows:


**Fig. 5.** Modules for simulating incrementation and decrementation/zero test

**Fig. 6.** The part of the protocol for counter nodes

**Fig. 7.** Parts of the protocol for auxiliary nodes

– control nodes will periodically run special broadcasts that would send any connected nodes (except nodes in state one<sup>j</sup> ) to a sink state, thus preventing synchronization. This way, we ensure that particular control node is *clean*. Initially, we require that control nodes have no connections at all.

We now present the detailed construction, depicted at Figs. 4, 5, 6 and 7. Each state of the protocol is actually able to synchronize with all the messages. Some transitions are not represented on the figures, to preserve readability: all nodes with no outgoing transitions (i.e., state Lhalt corresponding to the halting state, as well as states zero<sup>j</sup> and donei) actually carry a self-loop synchronizing on all messages; all other omitted transitions lead to a sink state, which is not part of the target set.

Let us explain the intended behaviour of the incrementation module of Fig. 5: when entering the module, our control node n in state L is linked to c<sup>1</sup> counter nodes in state one<sup>1</sup> and to c<sup>2</sup> counter nodes in state one2; it has no other links. Moreover, all auxiliary nodes are either in state free<sup>i</sup> or in state donei. Running through the incrementation module from L will use one counter node m in state zero<sup>j</sup> (which is used to effectively encode the increase of counter c<sup>j</sup> ) and four auxiliary nodes a<sup>1</sup> (initially in state free1), a<sup>2</sup> (in state free2), and a<sup>3</sup> and a 3 (in state free3).

The execution then runs as follows:


After this sequence of steps, node n has an extra link to a counter node in state one<sup>j</sup> , which indeed corresponds to incrementing counter c<sup>j</sup> . Moreover, no nodes have been left in an intermediary state. A similar analysis can be done for the second module, which implements the zero-test and decrementation. This way, we can prove that if the two-counter machine has a halting computation, then there is an initial configuration of our broadcast protocol from which there is an execution synchronizing in the set F formed of the halting control state and states one<sup>j</sup> , zero<sup>j</sup> and donei.

It now remains to prove the other direction. More precisely, we prove that from a 1-constrained synchronizing execution of the protocol, we can extract a synchronizing execution in some normal form, from which we derive a halting execution of the two-counter machine.

Fix a 1-constrained synchronizing execution of the broadcast network. First notice that when a control node n reaches some state L (the first node of an incrementation or decrementation module), it may only be linked to counter nodes in state one<sup>j</sup> : this is because states L can only be reached by sending !!*i-exit*, !!*d-exit*, !!*t-exit*, or !!*start*. The former two cases may only synchronize with counter nodes in state one<sup>j</sup> ; in the other two cases, node n may be linked to no other node. Hence, for a control node n to traverse an incrementation module, it must get links to four auxiliary nodes (in order to receive the four *fr* messages), those four links must be removed (to avoid reaching the sink state), and an extra link has to be created in order to receive message *i-ack* <sup>j</sup> . In total, traversing an incrementation module takes nine communication steps and at least nine reconfiguration steps. Similarly, traversing a decrementation module via any of the two branches takes at least as many reconfiguration steps as communication steps. In the end, taking into account the initial !!*start* communication step, if a control node n is involved in <sup>B</sup><sup>n</sup> communication steps, it must be involved in at least <sup>B</sup><sup>n</sup> <sup>−</sup> <sup>1</sup> reconfiguration steps.

Assume that every control node n is involved in at least B<sup>n</sup> reconfiguration steps: then we would have at least as many reconfiguration steps as communication steps, which in a 1-constrained execution is impossible. Hence there must be a control node <sup>n</sup><sup>0</sup> performing <sup>B</sup>n<sup>0</sup> communication steps and exactly <sup>B</sup>n<sup>0</sup> <sup>−</sup> <sup>1</sup> reconfiguration steps. As a consequence, when traversing an incrementation module, node n<sup>0</sup> indeed gets connected to exactly one new counter node, which indeed must be in state one<sup>j</sup> when n<sup>0</sup> reaches the first state of the next module. Similarly, traversing a decrementation/zero-test module indeed performs the expected changes. It follows that the sequence of steps involving node n<sup>0</sup> encodes a halting execution of the two-counter machines.

The 1-constrained executions in the proof of Theorem 15 have the additional property that all graphs describing configurations are 2-bounded-path configurations. For <sup>K</sup> <sup>∈</sup> <sup>N</sup> a configuration <sup>G</sup> is a <sup>K</sup>-*bounded-path configuration* if the length of all simple paths in G is bounded by K. Note that a constant bound on the length of simple paths implies that the diameter (*i.e.* the length of the longest shortest path between any pair of vertices) is itself bounded. The synchronization problem was proved to be undecidable for broadcast networks *without reconfiguration* when restricting to K-bounded-path configurations [6]. In comparison, for reconfigurable broadcast networks under k-constrained reconfigurations, the undecidability result stated in Theorem 15 can be strengthened into:

**Corollary 16.** *The synchronization problem is undecidable for reconfigurable broadcast networks under* k*-constrained reconfigurations when restricted either to bounded-path configurations, or to bounded-diameter configurations.*

### **3.2 Decidability Results**

*f-Constrained and k-Locally-Constrained Reconfigurations.* From the equivalence (w.r.t. synchronization) of k-locally-constrained, f-constrained and unconstrained executions (Lemmas 8 and 9), and thanks to Theorem 2, we immediately get:

**Corollary 17.** *Let* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and* <sup>f</sup> : <sup>N</sup> <sup>→</sup> <sup>N</sup> *be a non-decreasing diverging function. The synchronization problem for reconfigurable broadcast networks under* k*locally-constrained (resp.* f*-constrained) reconfigurations is decidable in* PTIME*.*

*Bounded Degree Topology.* We now return to k-constrained reconfigurations, and explore restrictions that allow one to recover decidability of the synchronization problem. We further restrict k-constrained reconfigurations by requiring that the degree of nodes remains bounded, by 1; in other terms, communications correspond to *rendez-vous* between the broadcasting node and its single neighbour.

**Theorem 18.** *The synchronization problem is decidable for reconfigurable broadcast networks under* k*-constrained reconfiguration when restricted to 1 bounded-degree topologies.*

*Sketch of Proof.* The proof consists in transforming the synchronization problem above into a reachability problem for some Petri net. The Petri net has two kinds of places (plus a few auxiliary places): one place for each state of the protocol, representing isolated nodes (*i.e.*, nodes having no neighbours), and one place for each pair of states of the protocol, representing pairs of connected nodes. Since we restrict to degree-1 topologies, any node of the network is in one of those two configurations. Places representing isolated nodes are simply called *isolated places* in the sequel, while places corresponding to pairs of connects nodes are called *connected places*.

An initialization phase stores tokens in the places described above, so as to represent the initial configuration. In a second phase, the Petri net simulates an execution of the reconfigurable broadcast network: communication steps and (k-constrained) reconfiguration steps are easily encoded as transitions of this Petri net: communication steps correspond to moving tokens from one place to the place obtained by updating the states as prescribed by the transitions of the broadcast protocol. Atomic reconfigurations may create or remove links, either consuming two tokens in isolated places and adding a token in the corresponding connected place, or the other way around. We use k auxiliary places in order to count the number of atomic reconfigurations, in order to enforce the k-constraint.

Finally, the Petri net may enter a terminal phase, where it checks synchronization by absorbing all tokens that lie in (isolated or connected) places corresponding to target states. In the end, the simulated execution has been synchronizing if, and only if, no tokens remain in any of the main states.

#### **4 Conclusion**

Restricting reconfigurations in reconfigurable broadcast networks is natural to better reflect mobility when communications are frequent enough and the movement of nodes is not chaotic. In this paper, we studied how constraints on the number of reconfigurations (at each step and for each node, at each step and globally, or along an execution) change the semantics of networks, in particular with respect to the synchronization problem, and affect its decidability. Our main results are the equivalence of k-constrained and k-balanced semantics, the undecidability of synchronization under k-constrained reconfigurations, and its decidability when restricting to 1-bounded-degree topologies.

As future work, we propose to investigate, beyond the coverability and synchronization problems, richer objectives such as cardinality reachability problems as in [5]. Moreover, for semantics with constrained reconfigurations that are equivalent to the unconstrained one as far as the coverability and synchronization problems are concerned, it would be worth studying the impact of the reconfiguration restrictions (*e.g.* k-locally-constrained or f-constrained) on the minimum number of nodes for which a synchronizing execution exists, and on the minimum number of steps to synchronize.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **EMME: A Formal Tool for ECMAScript Memory Model Evaluation**

Cristian Mattarei1(B) , Clark Barrett<sup>1</sup> , Shu-yu Guo<sup>2</sup>, Bradley Nelson<sup>3</sup>, and Ben Smith<sup>3</sup>

> <sup>1</sup> Stanford University, Stanford, CA, USA {mattarei,barrett}@cs.stanford.edu <sup>2</sup> Mozilla, Mountain View, USA shu@rfrn.org <sup>3</sup> Google Inc., Mountain View, USA {bradnelson,binji}@google.com

**Abstract.** Nearly all web-based interfaces are written in JavaScript. Given its prevalence, the support for high performance JavaScript code is crucial. The ECMA Technical Committee 39 (TC39) has recently extended the ECMAScript language (i.e., JavaScript) to support shared memory accesses between different threads. The extension is given in terms of a natural language memory model specification. In this paper we describe a formal approach for validating both the memory model and its implementations in various JavaScript engines. We first introduce a formal version of the memory model and report results on checking the model for consistency and other properties. We then introduce our tool, EMME, built on top of the Alloy analyzer, which leverages the model to generate all possible valid executions of a given JavaScript program. Finally, we report results using EMME together with small test programs to analyze industrial JavaScript engines. We show that EMME can find bugs as well as missed opportunities for optimization.

### **1 Introduction**

As web-based applications written in JavaScript continue to increase in complexity, there is a corresponding need for these applications to interact efficiently with modern hardware architectures. Over the last decade, processor architectures have moved from single-core to multi-core, with the latter now present in the vast majority of both desktop and mobile platforms. In 2012, an extension to JavaScript was standardized [20] which supports the creation of multi-threaded parallel Web Workers with message-passing. More recently, the committee responsible for JavaScript standardization extended the language to support shared memory access [10]. This extension integrates a new datatype

Mozilla—At the time this work was done.

C. Mattarei—This work was supported by a research grant from Google. We would also like to thank JF Bastien from Apple for his support of this project.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 55–71, 2018. https://doi.org/10.1007/978-3-319-89963-3\_4

called *SharedArrayBuffer* which allows for concurrent memory accesses, thus enabling more efficient multi-threaded program interaction.

Given a multi-threaded program that uses shared memory, there can be several possible valid executions of the program, given that reads and writes may concurrently operate on the same shared memory and that every thread can have a different view of it. However, not all behaviors are allowed, and the separation between valid and invalid behaviors is defined by a *memory model*. In one common approach, memory models are specified using axioms, and the correctness of a program execution is determined by checking its consistency with the axioms in the memory model. Given a set of memory operations (i.e., reads and writes) over shared memory, the memory model defines which combinations of written values each read event can observe. Because many different programs can have the same behaviors, the memory model is also particularly important for helping to determine the set of possible optimizations that a compiler can apply to a given program. As an example, a memory model could specify that the only allowed multi-threaded executions are those that are equivalent to a sequential program composed of some interleaving of the events in each thread. This model is the most stringent one and is called sequential consistency.With this approach, all threads observe the same total order of events. However, this model has significant performance limitations. In particular, it requires all cores/processors to synchronize their local cache with each other in order to maintain a coherent order of the memory events. In order to overcome such limitations, weaker memory models have been introduced. The ECMAScript Memory Model is a weak model.

Memory models are notoriously challenging to analyze with conventional testing alone, due to their non-intuitive semantics and formal axiomatic definitions. As a result, formal methods are frequently used in order to verify and validate the correctness of memory models [4–7,18]. Some of these models apply to instruction set architectures, whereas others apply to high-level programming languages. In this work, we use formal methods to validate the ECMAScript Memory Model and to analyze the correctness and performance of different implementations of ECMAScript engines. JavaScript is usually regarded as a high-level programming language, but its memory model is decidedly low-level and more closely matches that of instruction set architectures than that of other languages. The analyses that we provide are based on a formalization of the memory model using the Alloy language [12], which is then combined with a formal translation of the program to be analyzed in order to compute its set of valid executions. This result can then be used to automatically generate litmus tests that can be run on a concrete ECMAScript engine, allowing the developers to evaluate its correctness. The concrete executions observed when running the ECMAScript engine can either be a subset of, be equivalent to, or be a superset of the valid executions. Standard litmus test analyses usually target the latter case (incorrect engine behavior), providing little information in the other cases. However, when the concrete engine's observed executions are a relatively small subset of the valid executions, (e.g., 1/5 the size), this can indicate a missed opportunity for code optimization. As part of our work, we introduce a novel approach in such cases that is able to identify specific predicates over the memory model that are always consistent with the executions of the concrete engine, thus providing guidance about where potential optimization opportunities might exist.

The analyses proposed in this paper have been implemented in a tool called **E**CMAScript **M**emory **M**odel **E**valuator (EMME), which has been used to validate the memory model and to test the compliance of all major ECMAScript engines, including Google's V8 [1], Apple's JSC [2], and Mozilla's SpiderMonkey [3].

The rest of the paper is organized as follows: Sect. 2 covers related work on formal analysis of memory models; Sect. 3 describes the ECMAScript Memory Model and its formal representation; Sect. 4 characterizes the analyses that are presented in this paper; Sect. 5 provides an overview of the Alloy translation; Sect. 6 concentrates on the tool implementation and the design choices that were made; Sect. 7 provides an evaluation of the performance of the different techniques proposed in this paper; Sect. 8 describes the results of the analyses performed on the ECMAScript Memory Model and several specific engine implementations; and Sect. 9 provides concluding remarks.

#### **2 Related Work**

Most modern multiprocessor systems implement relaxed memory models, enabling them to deliver better performance when compared to more strict models. Well known approaches such as Sequential Consistency (SC), Processor Consistency (PC), Relaxed-Memory Order (RMO), Total Store Order (TSO), and Partial Store Order (PSO) are mainly directed towards relaxing the constraints on when read and write operations can be reordered.

The formal analysis of weak memory model hardware implementations has typically been done using SAT-based techniques [5,9]. In [4], a formal analysis based on Coq is used in order to evaluate SC, TSO, PSO, and RMO memory models. The DIY tool developed in [4] generates assembly programs to run against Power and x86 architectures. In contrast, in this work we concentrate on the analysis of the ECMAScript memory model, assuming the processor behavior is correct.

MemSAT [19] is a formal tool, based on Alloy [12], that allows for the verification of axiomatic memory models. Given a program enriched with assertions, MemSAT finds a trace execution (if it exists) where both assertions and the axioms in the memory model are satisfied.

An analysis of the C++ memory model is presented in [6]. The formalization is based on the LEM language [17], and the CPPMem software provides all possible interpretations of a C/C++ program consistent with the memory model. More recently, an approach based on Alloy and oriented towards synthesizing litmus tests is proposed in [14].

In this paper, we build on ideas present in MemSAT and CPPMem to build a tool for JavaScript. Our EMME tool can provide the set of valid executions for a given input JavaScript program, and it can also generate litmus tests suitable for evaluating the correctness of JavaScript engine implementations. In contrast to previous work, we also analyze situations where the litmus tests provide correct results but expose a discrepancy between the number of observed behaviors in the implementation and what is possible given the specification.

**Fig. 1.** Concurrent program example **Fig. 2.** Shared memory views

### **3 The ECMAScript Memory Model**

The objective of the ECMAScript Memory Model is to precisely define when an execution of a concurrent program that relies on shared memory is valid. From the point of view of the Memory Model, a JavaScript program can be abstracted as a set of threads, each of them composed of an ordered set of shared memory events. Each memory event has a set of attributes that specify its: operation (*Read*, *Write*, or *ReadModifyWrite*); ordering (*SeqCst*, *Unordered*, or *Init*); tear type (whether a single read operation can read from two different writes to the same location); (source or destination) memory block and address; payload value; and modify operation (in the case of a *ReadModifyWrite*). The shared memory is essentially an array of bytes, and a memory operation reads, writes, or modifies it. In these operations, the bytes can be interpreted either as *signed/unsigned integer* values or as *floating point* values. For instance, in Fig. 2, the notation x-I16[1] represents an access to the memory block x starting at index 1, where the bytes are interpreted as 16-bit signed integers (i.e., I16), while x-F32[0] stands for a 32-bit floating point value starting at byte 0.

Formally, a program is defined as a set of events E and a partial order between them, namely the *Agent Order*, that encodes the thread structure. For the example in Fig. 1, the set of events is defined as <sup>E</sup> <sup>=</sup> {ev1W<sup>1</sup>, ev2W<sup>2</sup>, ev3R<sup>2</sup>, ev4R<sup>3</sup>, ev5W<sup>3</sup>, ev6W<sup>3</sup>}, with agent order AO = AO<sup>1</sup> <sup>∪</sup> AO<sup>2</sup> <sup>∪</sup> AO<sup>3</sup>, where AO<sup>1</sup>, AO<sup>2</sup>, and AO<sup>3</sup> are the agent orders for each thread: AO<sup>1</sup> <sup>=</sup> {}, AO<sup>2</sup> <sup>=</sup> {(ev2W2, ev3R<sup>2</sup>)}, and AO<sup>3</sup> <sup>=</sup> {(ev4R3, ev5W<sup>3</sup>), (ev4R3, ev6W<sup>3</sup>), (ev5W3, ev6W<sup>3</sup>)}.

The execution semantics of a program is given by the *Reads Bytes From* (RBF) relation, a trinary relation which relates two events and a single byte index i, with the interpretation that the first event reads the byte at index i which was written by the second event. Looking again at the example in Fig. 1, one of the possible valid assignments to the RBF relation is {(ev4R3, ev1W1, 0), (ev3R2, ev2W2, 0), (ev3R2, ev6W3, 1)}, meaning that the *Read* event ev4R<sup>3</sup> reads byte 0 from ev1W<sup>1</sup> (taking the else branch), and ev3R<sup>2</sup> reads byte 0 from ev2W<sup>2</sup> and 1 from ev6W<sup>3</sup>.

The combination of a (finite) set of events <sup>E</sup> <sup>=</sup> {e1,...,en}, an agent order AO <sup>∈</sup> <sup>E</sup> <sup>×</sup> <sup>E</sup>, and a *Reads Bytes From* RBF <sup>∈</sup> <sup>E</sup> <sup>×</sup> <sup>E</sup> <sup>×</sup> <sup>N</sup> relation identify a *Candidate Execution*, and the purpose of the Memory Model is to partition this set into *Valid* and *Invalid* executions. The separation is defined as a formula that is satisfiable if and only if the *Candidate Execution* is *Valid*. Given a *Candidate Execution*, the Memory Model constructs a set of supporting relations in order to assess its validity:


Finally, a *Candidate Execution* is valid when the following predicates hold:


#### **3.1 Formal Representation**

The formalization of the ECMAScript Memory Model is based on the formal definition of a *Memory Operation*, shown in Definition 1.

**Definition 1 (Memory Operation).** *A Memory Operation is a tuple* ID*,* <sup>O</sup>*,* <sup>T</sup>*,* <sup>R</sup>*,* <sup>B</sup>*,* <sup>M</sup>*,* <sup>A</sup> *where:*


Note that this definition differs slightly from the one used in [10] (though the underlying semantics are the same). The differences make the model easier to reason about formally and include:


All relations in [10] (i.e., RBF, RF, SW, HB, and MO) are included in the formal model, and their semantics are defined using set operations, while the predicates (i.e., CR, TFR, and SCA) are expressed as formulas. The resulting formulation of the Memory Model, combining all constraints and predicates, is shown in Eq. (1). Details of our implementation of this formulation are given in Sect. 5.

MM(E, AO, RF, RBF, SW, HB, MO) := <sup>ϕ</sup>RBF (RBF, E) <sup>∧</sup> <sup>ϕ</sup>RF (RF, E, RBF) <sup>∧</sup> <sup>ϕ</sup>SW (SW, E, RF) <sup>∧</sup> <sup>ϕ</sup>HB(HB, E, AO, SW) <sup>∧</sup> <sup>ϕ</sup>MO(MO, E,HB, SW) <sup>∧</sup> CR(E, HB, RBF) <sup>∧</sup> TFR(E, RF) <sup>∧</sup> SCA(MO) (1)

### **4 Formal Analyses**

The design and development of a critical (software or hardware) system often follows a process in which high-level requirements (such as the standards committee's specification of the memory model) are used to guide an actual implementation. This process can be integrated with different formal analyses to ensure that the result is a faithful implementation with respect to the requirements. In this section, we describe the set of analyses that we used to validate the requirements and implementations of the ECMAScript Memory Model. Results of our analyses are reported in Sect. 8.

#### **4.1 Formal Requirements Validation**

The ECMAScript Memory Model defines a set of *constraints* which together make up a formula (Eq. (1)). The solutions of this formula are the valid executions. The Memory Model also lists a number of *assertions*, formulas that are expected to be true in every valid execution (and thus must follow from the constraints). Complete formal requirements validation would require checking two things: (i) the constraints are consistent with each other, i.e. they contain no contradictions; and (ii) each assertion is logically entailed by the set of constraints in the Memory Model. However, because we used Alloy (see Sect. 5) we were unable to show full logical entailment, as Alloy can only reason about a finite number of events. So we instead showed that for finite sets of events up to a certain size, (i) and (ii) hold. In future work, we plan to explore using an SMT solver to see if we can prove unbounded entailment in some cases. When (i) or (ii) do not hold, there is a bug in either the requirements or the formal modeling of the requirements. To help debug problems with (i), we used the unsat core feature of Alloy, which identifies a subset of the constraints that are inconsistent. To further aid debugging, we labeled each constraint c<sup>i</sup> with a Boolean activation variable av<sup>i</sup> (i.e. we replaced <sup>c</sup><sup>i</sup> with (av<sup>i</sup> <sup>→</sup> <sup>c</sup>i) <sup>∧</sup> avi). This allowed us to inspect the unsat core for activation variables and immediately discern which constraints were active in producing the unsatisfiable result.

#### **4.2 Implementation Testing**

The *Implementation testing* phase analyzes whether a specific JavaScript engine correctly implements the ECMAScript Memory Model. In particular, given a program with shared memory operations, we generate: (1) the set of valid executions, (2) a litmus test, and (3) behavioral coverage constraints.

**Valid Executions.** This analysis lists all of (and only) the behaviors that the (provided) program can exhibit that are consistent with the Memory Model specification. The encoding of the problem is based on the following definition:

> VE(E, AO) := {(RBF, HB, MO, SW) | MM(E, AO, RF, RBF, SW, HB, MO) is SAT}

where VE(E, AO) is the complete (and finite because the program itself is finite) set of possible assignments to the RBF, HB, MO, and SW relations. Each assignment corresponds to a *valid* execution.

**Litmus Tests.** *Litmus test generation* uses the generated list of valid executions to construct a JavaScript program enriched with an assertion that is violated if the output of the program does not match any of the valid executions. A litmus test is executed multiple times (e.g., millions), in order to increase the chance of exposing a problem if there is one.

The result of running a litmus test many times can (in general) have one of three outcomes: the assertion is violated at least once, the assertion is not violated and all possible executions are observed, and the assertion is not violated and only some of the possible executions are observed. More specifically, given a program P, the set of its valid executions *VE*(P), and the set of concrete executions E<sup>N</sup> (P) (obtained by running the JavaScript program on engine E some number of times N), the possible results can be respectively expressed as <sup>E</sup><sup>N</sup> (P)\*VE*(P) <sup>=</sup> <sup>∅</sup>, <sup>E</sup><sup>N</sup> (P) = *VE*(P), and <sup>E</sup><sup>N</sup> (P) <sup>⊂</sup> *VE*(P).

**Behavioral Coverage Constraints.** Though they can expose bugs, the litmus tests do not provide a guarantee of implementation correctness. In fact, even when a "bug" is found, it could be that the specification is too tight (i.e., it is incompatible with some intended behaviors) rather than that the implementation wrong. On the other hand, when <sup>E</sup><sup>N</sup> (P) <sup>⊂</sup> *VE*(P), and especially if the cardinality of E<sup>N</sup> (P) is significantly smaller than that of *VE*(P), it might be the case that the implementation is too simple: it is not taking sufficient advantage of the weak memory model and is therefore unnecessarily inefficient.

Whenever <sup>E</sup><sup>N</sup> (P) <sup>⊂</sup> *VE*(P), this situation can be analyzed by the generation of *Behavioral Coverage Constraints*. The goal of this analysis is to synthesize the formulae ΣOBS and ΣUNOBS, for observed and unobserved outputs, that restrict the behavior of the memory model in order to match <sup>E</sup><sup>N</sup> (P) and *VE*(P)\E<sup>N</sup> (P).

Our approach to doing this relies on first choosing a set <sup>Π</sup> <sup>=</sup> {π1,...,πn} of predicates over which the formula will be constructed. One choice for Π might be all atomic predicates appearing in Eq. (1). Now, let Δ(Π) be the set of all cubes of size n over Π. Formally,

$$\Delta(II) = \{l\_1 \wedge \dots \wedge l\_n \mid \forall \, 1 \le i \le n. \, l\_i \in \{\pi\_i, \neg \pi\_i\}\}.$$

Further, define the observed and unobserved executions as:

$$\begin{array}{lcl}EX\_{OBS} &=& \bigvee\_{\langle RBF,\,HB,\,MO,\,SW\rangle \in E\_N(P)} \big(RBF \wedge HB \wedge MO \wedge SW) \\ EX\_{UNOBS} &=& \bigvee\_{\langle RBF,\,HB,\,MO,\,SW\rangle \in VE(P)} \big(RBF \wedge HB \wedge MO \wedge SW) \end{array}$$

We compute those cubes in Δ(Π) that are consistent with the observed and unobserved executions as follows:

$$\begin{array}{l} \delta\_{OBS}(\varPi) &= \{ \delta \in \varDelta(\varPi) \mid MM \wedge EX\_{OBS} \wedge \delta \text{ is satisfiable} \} \\ \delta\_{UNOBS}(\varPi) &= \{ \delta \in \varDelta(\varPi) \mid MM \wedge EX\_{UNOBS} \wedge \delta \text{ is satisfiable} \} \end{array}$$

The cubes are then combined to generate the formulae for matched and unmatched executions:

$$
\Sigma\_{OBS} = \bigvee\_{\delta \in \delta\_{OBS}} \delta, \quad \Sigma\_{UNOBS} = \bigvee\_{\delta \in \delta\_{UNOBS}} \delta.
$$

For example, let (R2H := <sup>∀</sup><sup>e</sup>1,e2∈<sup>E</sup> : RF(e1, e2) <sup>→</sup> HB(e1, e2)) <sup>∈</sup> <sup>Π</sup> be a predicate expressing that every tuple in *Reads From* is also in *Happens Before*. If the behavioral coverage constraints analysis generates ΣOBS = R2H and <sup>Σ</sup>UNOBS <sup>=</sup> <sup>¬</sup>R2H, it means that the JavaScript engine always aligns the read from relation with the HB relation, thus identifying a possible path for optimization in order to take advantage of the (weak) memory model.

#### **5 Alloy Formalization**

Alloy is a widely used modeling language that can be used to describe data structures. The Alloy language is based on relational algebra and has been successfully used in many applications, including the analysis of memory models [14].

We used Alloy to formalize the memory model discussed in Sect. 3.1. We followed the formalization given in Definition 1, using sets and relations to represent each concept.<sup>1</sup> For instance, an operation type is defined as an (abstract) set with three disjoint subsets (R for *Read*, W for *Write*, and M for *ReadModifyWrite*), one for each possible operation. In contrast, blocks and bytes are represented as sets. A memory operation is modeled as a relation which links all of the attributes necessary to describe a memory event.

```
6.3.1.14 happens-before
```

c. ...


The formalization of a natural language specification usually requires multiple attempts and iterations before the intended semantics become clear. In the case of the ECMAScript Memory Model, this process was crucial for disambiguating some of the stated constraints. An example is the *Happens Before* relation. Figure 3 shows an excerpt of its definition, expressing how it is related to the *Agent Order* and *Synchronizes With* relations. One might expect that the formal interpretation would be something like: <sup>∀</sup> (e1, e2). (AO(e1, e2) <sup>→</sup> HB(e1, e2)) <sup>∧</sup> (SW(e1, e2) <sup>→</sup> HB(e1, e2)) <sup>∧</sup> (...)



However, further analysis and discussions with the people responsible for the Memory Model revealed that the correct interpretation is: <sup>∀</sup> (e1, e2).HB(e1, e2) <sup>↔</sup> (AO(e1, e2)∨SW(e1, e2)∨...). The Alloy formalization of the *Happens Before* relation is shown in Fig. 4. The Active2 predicate evaluates to true when both events are active.

<sup>4.</sup> For each pair of events E and D in EventSet(execution):

<sup>1</sup> The complete Alloy model is available at https://github.com/FMJS/EMME/blob/ master/model/memory model.als.

Once the Memory Model has been formalized, the next step is to combine it with the encoding of the program under analysis. This requires modeling the memory events present in each thread. In the Alloy model, each event in a program extends the set of memory events, and its values are defined as a series of facts. Figure 5 shows an example of the Alloy model for the event ev5W<sup>3</sup> from Fig. 1. A notable aspect of this example is the fact that its activation is dependent on the value of id1 cond which symbolically represents the condition of the *if-then-else* statement.

```
1 one s ig ev5 W t3 extends mem events {}
2 fact ev5 W t 3 def {(ev5 W t3 .O = W) and
                      ( ev5 W t3 .T = NT) and
4 ( ev5 W t3 .R = U) and
                      ( ev5 W t3 .M = { byte 0 }) and
6 ( ( ev5 W t3 .A = ENABLED) <=> (( id1 c o n d . v a l u e = TRUE) ) ) and
                      ( ev5 W t3 .B = x)}
8 fact ev5 W t 3 i n mem events { ev5 W t3 in mem events}
```
**Fig. 5.** Event *ev*5*W*<sup>3</sup> encoding (w.r.t. Fig. 1)

### **6 Implementation**

The techniques described in this paper have been implemented in a tool called EMME: **E**CMAScript **M**emory **M**odel **E**valuator [15]. The tool is written in Python, is open source, and its usage is regulated by a modified BSD license. The input to EMME is a program with shared memory accesses. The tool interacts with the Alloy Analyzer [13] to perform the formal analyses described in Sect. 4, which include the enumeration of valid executions and the generation of behavioral coverage constraints.

**Input Format and Encoding.** The input format of EMME uses a simplified JavaScript-like syntax. It supports the definition of *Read*, *Write*, and *Read-ModifyWrite* events, allows events to be atomic or not atomic, and supports operations on integer or floating point values. The input format also supports *if-then-else* and bounded *for-loop* statements, as well as parametric values. An example of an input program is shown in Fig. 6. The program is encoded in Alloy and combined with the memory model in order to provide the input formula for the formal analyses.

```
1 var x = new Sha redA r rayBu f fe r ( ) ;
2
    Thread t1 {
4 x−I8 [0] = 1;
      print (x−I16 [0 ] ) ;
6 }
8 Thread t2 {
       if (x−I 8 [ 0 ] == 1 ) {
10 x−I8 [0] = 3;
      } else {
12 x−I8 [1] = 3;
      }
14 }
```
**Fig. 6.** EMME input for the program from Fig. 1.

**Generation of All Valid Executions.** The generation of all valid executions is computed by using Alloy to solve the AllSAT problem. In this case, the distinguishing models of the formula are the assignments to the RBF relation. Thus, after each satisfiability check iteration of the Alloy Analyzer, an additional constraint is added in order to block the current assignment to the RBF relation. This procedure is performed until the model becomes unsatisfiable.

As described in Sect. 3.1, our formal model does not encode the concrete values of each memory operation; thus, the extraction of a valid execution, given a satisfiable assignment to the formula, requires an additional step. This step is to reconstruct the values of each read or modify operation based on the program and the assignment to the RBF relation. For example, given the program in Fig. 1, and assuming that the RBF relation contains the tuples (ev3R2, ev2W2, 0) and (ev3R2, ev6W3, 1), the reconstruction of the value read by ev3R<sup>2</sup> depends on the fact that ev2W<sup>2</sup> writes 1 with an 8-bit integer encoding at position 0, while ev6W<sup>3</sup> writes 3 at position 1. The composition of byte 0 and byte 1 from those two writes is the input for the decoding of a 16-bit integer for the event ev3R<sup>2</sup>, resulting in a read of the value 769. Clearly, each event could also have a different size and format (i.e., integer, unsigned integer, or float); thus, the reconstruction of the correct value must also take this into account.

When interpreting a program containing *if-then-else* statements, the possible outcomes must be filtered to exclude executions that break the semantics of *ifthen-else*. In particular, it might be the case that the Boolean condition in the model does not match the concrete value, given the read values. For instance, consider the example in Fig. 6 in which the conditional is encoded as a Boolean variable id1 cond representing the statement x-I8[0] == 1. However, the tool may assign id1 cond to false even though the event x-I8[0] turns out to read a value different from 1 based on the information in the RBF relation. In this case, this execution is discarded since it is not possible given the semantics of the *if-then-else* statement.

*Graph Representation of the Results.* For each valid execution, EMME will produce a graphviz file that provides a graphical representation of the assignments to main relations and read values. An example of this graphical representation is shown in Fig. 7. The default setup removes some redundant information such as the explicit transitive closure of the HB relation, while RF and AO are not represented, and the total order MO is reported in the top right corner. Black arrows are used to represent the HB relation, while red and blue are respectively used for RBF and SW. Figure 7(a) represents an execution where event ev4 R t3 reads value 1 from ev2 W t2, thus executing the *THEN* branch in the *if-then-else* statement. In contrast, Fig. 7(b) reports an execution where it reads 0, thus taking the *ELSE* branch.

*Litmus Test Generation.* The generation of all valid executions also constructs a JavaScript litmus test that can be used to evaluate whether the engine respects the semantics of the Memory Model. The structure of the litmus test mirrors that of the input program, but the syntax follows the official TEST262 ECMAScript conformance standard [11].

To check whether a test produced a valid result, the results of memory operations must be collected. The basic idea consists of printing the values of each read and collecting them all at the thread level. The main thread is then responsible

**Fig. 7.** Memory model interpretations of the program in Fig. 6.

for collecting all the results. The sorted report is then compared with the set of expected outputs using an assertion. Moreover, the test contains a part that is parsed by the Litmus script, which is provided along with the EMME tool, and provides a list of expected outputs. The Litmus script is used to facilitate the execution of multiple runs of the same test, and it will provide a summary of the results as well as a warning whenever one of the executions observed is a not valid according to the standard.

**Generation of the Behavioral Coverage Constraints.** As described in Sect. 6, for each assignment to the RBF relation, it is possible to construct a concrete value for each memory event. Thus, for each RBF assignment in a set of valid executions for a given program, we can determine the output of the corresponding litmus test. Thus, running the litmus test many times on a JavaScript engine, it is possible to determine which assignments to the RBF relation have been matched. We denote these MA rbf1,..., MA rbfn. The unmatched assignments to RBF can also be determined simply by removing the matched ones from the set of all valid executions. We denote the unmatched ones UN rbf1, ..., UN rbfm.

As described in Sect. 4, the generation of separation constraints that distinguish between matched and unmatched executions first requires the definition of a set of predicates Π. The extraction of the separation constraints is based on an AllSAT call for matched and unmatched results. The former is shown in (2), and consists of extracting all assignments to the predicates Π such that the models of the RBF relation are consistent with MA rbf <sup>i</sup>.

$$\begin{aligned} \text{ALLSAT}\_{H}[MM(E, AO, RBF, \dots) \land (E = BE\_{E}) \land (AO = BE\_{AO})\\ \land (\bigvee\_{i=1, \dots, k} RBF = \text{MA.rbf}\_{i})] \end{aligned} \tag{2}$$

Similarly, the evaluation for the unmatched executions performs an AllSAT analysis for the formula reported in (3). The results of these two calls to the solver produce respectively the formula ΣOBS and ΣUNOBS as described in Sect. 4.

$$\begin{aligned} \text{ALLSAT}\_{II}[MM(E, AO, RBF, \dots) \land (E = BEE) \land (AO = BE\_{AO})\\ \land (\bigvee\_{i=1, \dots, k} RBF = \text{UN}\_{\bullet} \text{rbf}\_{i})] \end{aligned} \tag{3}$$

The results from the two AllSAT queries can then be manipulated using a BDD [8] package that produces in most cases a smaller formula. After this step, the tool provides a set of formal comparisons that can be done between these two formulas such as implication, intersection, and disjunction, in order to understand the relation between ΣOBS and ΣUNOBS.

#### **7 Experimental Evaluations**

In this section, we evaluate the performance of EMME over a set of programs, each containing up to 8 memory events. The analyses can be reproduced using the package available at [16].

*Programs Under Analysis.* In this work, we rely on programs from previous work [6] as well as handcrafted and automatically generated programs. The handcrafted examples are part of the EMME [15] distribution, and they cover a variety of different configurations with 1 to 8 memory events, if-statements, for-loops, and parametric definitions.

The programs from previous work as well as the handcrafted examples cover an interesting set of examples, but provide no particular guarantees on the space of programs that are covered. To overcome this limitation, we implemented a tool that enumerates all possible programs of a fixed size, thus giving us the possibility of generating programs to entirely cover the space of configurations, given a fixed set of events.

The sizes of the programs considered in this evaluation allow us to cover a representative variety of possible event interactions, while preserving a reasonable level of readability of the results. In fact, a program with 8 memory events can have hundreds of valid executions that often require extensive manual effort to understand.

*All Valid Executions.* As described in Sect. 6, the generation of all valid executions is based on a single AllSAT procedure. Figure 8 shows a scalability evaluation when generating all valid executions of 1200 program instances, each with from 3 to 8 memory events (200 programs for each configuration). The x-axis refers to the program number, ordered first by number of memory events, and then by increasing execution time, while the y-axis reports the execution time (in seconds on an

**Fig. 8.** Generation of all valid executions (form 3 to 8 memory events).

Intel i7-6700 @ 3.4 GHz) on a logarithmic scale. The results show that the proposed approach is able to analyze programs with 7 memory events in fewer than 10 s, providing reasonable responsiveness to deal with small, but informative, programs.

*Behavioral Coverage Constraints.* For the coverage constraints analysis, we first extracted a subset of the 1200 tests, considering only the ones that could produce at least 5 different outputs. There were 288 such tests. For each test, we ran the JavaScript engine 500 times, and performed an analysis using 11 predicates, each of which corresponds to a sub-part of the Memory Model, as well as some additional formulae. During this evaluation, the average computation time required to perform the behavioral coverage constraints analysis was 3.25 s, with a variance of 0.37 s.

### **8 Results of the Formal Analyses**

In this Section we provide an overview of the results of the formal analyses for the ECMAScript Memory Model.

**Circular relations definition.** In the original Memory Model, a subset of the relations were specified using circular definitions. More specifically, using the notation a → b as "the definition of a depends on b", the loop was *Synchronizes With* → *Reads From* → *Reads Bytes From* → *Happens Before* → *Synchronizes With*. Cyclic definitions can result in vacuous constraints, and in the case of binary relations, this manifests as solutions with unconstrained tuples that belong to all relations involved in the cycle. In order to solve this problem, the definition of *Reads Bytes From* was changed so that it no longer depends on *Happens Before*. In addition, the memory model was extended with a property called *Valid Coherent Reads* that constrains the possible tuples belonging to the *Reads Bytes From* relation.

**Misalignment of the ComposeWriteEventBytes.** The memory model defines a *Reads Bytes From* relation, and checks whether the tuples belonging to it are valid by relying on a function called *ComposeWriteEventBytes*. Given a list of writes, the *ComposeWriteEventBytes* function creates a vector of values associated with a read event; however, the index for each write event was not correct, resulting in a misalignment w.r.t. the *Reads Bytes From* relation. An additional offset was added in order to fix the problem.

**Distinct events quantification.** Another problem encountered while analyzing the ECMAScript memory model was caused by a series of inconsistent constraints. One example of inconsistency was in the definition of the *Happens Before* relation which prescribes that for any two events ev<sup>1</sup> and ev<sup>2</sup> with overlapping ranges, whenever ev<sup>1</sup> is of type *Init*, ev<sup>2</sup> should be of a different type (i.e., not *Init*). However, there was no constraint stating that ev<sup>1</sup> and ev<sup>2</sup> have to be distinct, and certainly, whenever ev<sup>1</sup> and ev<sup>2</sup> are not distinct then this expression is unsatisfiable.

A similar inconsistency was found in the definition of the *Memory Order* relation. In this case, if the SW relation contains the pair (ev1, ev2), and (ev1, ev2) <sup>∈</sup> HB, then the MO should contain (ev1, ev2). However, this is inconsistent with another constraint requiring that no event ev<sup>3</sup> should exist operating on the same memory addresses as ev<sup>2</sup> such that both (ev1, ev3) <sup>∈</sup> MO and (ev3, ev2) <sup>∈</sup> MO. This constraint is false when ev<sup>1</sup> = ev<sup>2</sup> = ev3. Both the *Happens Before* and the *Memory Order* relations initially permitted any pairs of elements to be related (including two equal elements). The solution was to only allow pairs of distinct events in these relations.

The definition of the *Reads Bytes From* relation stated that each read or modify event ev1R is associated with a list of pairs of byte indices and write or modify events. The definition did not specifically preclude allowing modify events to read from themselves. This does not cause any particular issues at the formal model level, but it is not clear what the implication at the JavaScript engine implementation level would be. In order to resolve this issue, the definition of the *Reads Bytes From* relation was modified to allow only events that are distinct to be related by *Reads Bytes From*.

**Outputs coverage on ECMAScript engines.** As described in Sect. 4, the litmus test analysis can result in three possible outcomes, e.g., <sup>E</sup>x(P)\*VE*(P) <sup>=</sup> <sup>∅</sup> when the engine violates the specification, Ex(P) = *VE*(P) when the engine matches the specification, and <sup>E</sup>x(P) <sup>⊂</sup> *VE*(P) when the engine is more restrictive than the specification. Typically, such an analysis is designed to find bugs in the software implementation of the memory model [4,6], focusing on the first case (Ex(P)\*VE*(P) <sup>=</sup> <sup>∅</sup>). However, in this project, the last case was most prevalent, where Ex(P) is significantly smaller than *VE*(P).

For instance, when we ran the 288 examples with at least 5 possible outputs (from Sect. 7) 1000 times for each combination of program and JavaScript engine, the overall output coverage reached 75%, but for 1/6 of the examples, the coverage did not exceed 50%, and some were even below 15%<sup>2</sup>.

This situation (frequently having far fewer observed behaviors than allowed behaviors) guided our development of alternative analyses, such as the generation of the behavioral coverage constraints, to help developers understand the relationship between an engine's implementation and the memory model specification. Future improvements of JavaScript engines will likely be less conservative, meaning that more behaviors will be covered. The tests produced in this project will be essential to ensure that no bugs are introduced. Currently, we are in the process of adapting the litmus tests so that they can be included as part of the official TEST262 test suite for the ECMAScript Memory Model.

#### **9 Conclusion**

Extending JavaScript, the language used by nearly all web-based interfaces, to support shared memory operations warrants the use of extensive verification techniques. In this work, we have presented a tool that has been developed

<sup>2</sup> On an x86 machine, and with the latest version of the engines available on October 1st, 2017.

in order to support the design and development of the ECMAScript Memory Model. The formal analysis of the original specification allowed us to identify a number of potential issues and inconsistencies. The evaluation of the valid executions and litmus tests coverage analysis identified a conservative level of optimization in current engine implementations. This situation motivated us to develop a specific technique for understanding differences between the Memory Model specification and JavaScript engine implementations.

Future extensions to this work will consider providing additional techniques to help developers improve code optimizations in JavaScript engines. Techniques such as the synthesis of equivalent programs, and automated value instantiation given a parametric program will provide additional analytical capabilities able to identify possible directions for code optimization. Moreover, we will also consider integration with other constraint solving engines in order to deal with more complex programs.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# SAT and SMT II

## **What a Difference a Variable Makes**

Marijn J. H. Heule1(B) and Armin Biere<sup>2</sup>

<sup>1</sup> Department of Computer Science, The University of Texas, Austin, USA marijn@heule.nl

<sup>2</sup> Institute for Formal Models and Verification, JKU, Linz, Austria

**Abstract.** We present an algorithm and tool to convert derivations from the powerful recently proposed PR proof system into the widely used DRAT proof system. The PR proof system allows short proofs without new variables for some hard problems, while the DRAT proof system is supported by top-tier SAT solvers. Moreover, there exist efficient, formally verified checkers of DRAT proofs. Thus our tool can be used to validate PR proofs using these verified checkers. Our simulation algorithm uses only one new Boolean variable and the size increase is at most quadratic in the size of the propositional formula and the PR proof. The approach is evaluated on short PR proofs of hard problems, including the well-known pigeon-hole and Tseitin formulas. Applying our tool to PR proofs of pigeon-hole formulas results in short DRAT proofs, linear in size with respect to the size of the input formula, which have been certified by a formally verified proof checker.

### **1 Introduction**

Satisfiability (SAT) solvers are powerful tools for many applications in formal methods and artificial intelligence [3,9]. Arguably the most effective new techniques in recent years are based on *inprocessing* [21,25]: Interleaving preprocessing techniques and conflict-driven clause learning (CDCL) [26]. Several powerful inprocessing techniques, such as symmetry breaking [1,6] and blocked clause addition [23], do not preserve logical equivalence and cannot be expressed compactly using classical resolution proofs [30]. The RAT proof system [14] was designed to express such techniques succinctly and facilitate efficient proof validation. All top-tier SAT solvers support proof logging in the DRAT proof system [12], which extends the RAT proof system with clause deletion.

More recently a ground-breaking paper [8] presented at TACAS'17 showed how to efficiently certify huge propositional proofs of unsatisfiability by proof checkers, which are formally verified by theorem provers, such as ACL2 [7], Coq [7,8], and Isabelle/HOL [24]. These developments are clearly a breakthrough in SAT solving. They allow us to have the same trust in the correctness of the results produced by a highly tuned state-of-the-art SAT solver as into

Supported by the National Science Foundation (NSF) under grant CCF-1526760 and by the Austrian Science Fund (FWF) under project S11409-N23 (RiSE).

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 75–92, 2018. https://doi.org/10.1007/978-3-319-89963-3\_5

those claims deduced with proof producing theorem provers. We can now use SAT solvers as part of such fully trusted proof generating systems.

On the other hand, with even more powerful proof systems we can produce even smaller proofs. The goal in increasing the power of proof systems is to cover additional not yet covered but existing reasoning techniques compactly, e.g., algebraic reasoning, but also to provide a framework for investigating new inprocessing techniques. If proofs are required, then this is a necessary condition for solving certain formulas faster. However it makes proof checking more challenging. The recently proposed PR proof system [17] (best paper at CADE'17) is such a generalization of the RAT proof system, actually an instance of the most general way of defining a clausal proof system based on clause redundancy.

There are short PR proofs without new variables for some hard formulas [17]. Some of them can be found automatically [18]. The PR proof system can therefore reveal new powerful inprocessing techniques. Short proofs for hard formulas in the RAT proof system likely require many new variables, making it difficult to find them automatically. The question whether PR proofs can efficiently be converted into proofs in the RAT and DRAT proof systems has been open. In this paper, we give a positive answer and present a conversion algorithm that in the worst case results in a quadratic blowup in size. Surprisingly only a single new Boolean variable is required to convert PR proofs into DRAT proofs.

At this point there exists only an unverified checker to validate PR proofs, written in C. In order to increase the trust in the correctness of PR proofs, we implemented a tool, called PR2DRAT, to convert PR proofs into DRAT proofs, which in turn can be validated using verified proof checkers. Thanks to various optimizations, the size increase during conversion is rather modest on available PR proofs, thereby making this a useful certification approach in practice.

#### **Contributions**


#### **Structure**

After preliminaries in Sect. 2 we elaborate on clausal proof systems in Sect. 3 also taking the idea of deletion steps into account. Then Sect. 4 describes and analyzes our simulation algorithm. In Sect. 5 we present how to optimize our new algorithm for special cases followed by alternative simulation algorithms in Sect. 6. Experiments are presented in Sect. 7 before we conclude with Sect. 8.

### **2 Preliminaries**

Below we present the most important background concepts related to this paper.

*Propositional Logic.* Propositional formulas in *conjunctive normal form* (CNF) are the focus of this paper. A *literal* is either a variable x (a *positive literal*) or the negation <sup>x</sup> of a variable <sup>x</sup> (a *negative literal*). The *complementary literal* ¯<sup>l</sup> of a literal <sup>l</sup> is defined as ¯<sup>l</sup> <sup>=</sup> <sup>x</sup> if <sup>l</sup> <sup>=</sup> <sup>x</sup> and ¯<sup>l</sup> <sup>=</sup> <sup>x</sup> if <sup>l</sup> <sup>=</sup> <sup>x</sup>. A *clause* <sup>C</sup> is a disjunction of literals. A *formula* F is a conjunction of clauses. For a literal, clause, or formula φ, *var* (φ) denotes the variables in φ. We treat *var* (φ) as a variable if φ is a literal, and as a set of variables otherwise.

*Satisfiability.* An *assignment* is a (partial) function from a set of variables to the truth values 1 (*true*) and 0 (*false*). An assignment is *total* w.r.t. a formula if it assigns a truth value to all variables occurring in the formula. We extend a given α to an assignment over literals, clauses and formulas in the natural way. Let φ be either a literal, clause or formula φ. Then φ is *satisfied* if α(φ) = 1 and *falsified* if α(φ) = 0. Otherwise φ is *unassigned*. In particular, we have x is satisfied if x is falsified by α and vice versa. A clause is satisfied by α if it contains a literal that is satisfied by α and falsified if all its literals are falsified. Finally a formula is satisfied by α if all its clauses are satisfied by α. We often denote assignments by sequences of literals they satisfy. For instance, x y denotes the assignment that assigns 1 to x and 0 to y. For an assignment α, *var* (α) denotes the variables assigned by α. Further, α*<sup>l</sup>* denotes the assignment obtained from α by flipping the truth value of literal l assuming it is assigned. A formula is *satisfiable* if there exists an assignment that satisfies it and *unsatisfiable* otherwise.

*Formula Simplification.* We denote the empty clause by ⊥ and by the valid and always satisfied clause. A clause is a *tautology* if it contains a literal l and its negation ¯l. Given assignment <sup>α</sup> and clause <sup>C</sup>, we define <sup>C</sup> <sup>|</sup><sup>α</sup> <sup>=</sup> if <sup>α</sup> satisfies <sup>C</sup>; otherwise, <sup>C</sup> <sup>|</sup><sup>α</sup> denotes the result of removing from <sup>C</sup> all the literals falsified by <sup>α</sup>. For a formula <sup>F</sup>, we define <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>=</sup> {<sup>C</sup> <sup>|</sup><sup>α</sup> <sup>|</sup> <sup>C</sup> <sup>∈</sup> <sup>F</sup> and <sup>C</sup> <sup>|</sup><sup>α</sup> <sup>=</sup> }. We say that an assignment <sup>α</sup> *touches* a clause <sup>C</sup> if *var* (α) <sup>∩</sup> *var* (C) <sup>=</sup> <sup>∅</sup>. A *unit clause* is a clause with only one literal. The result of applying the *unit clause rule* to a formula <sup>F</sup> is the formula <sup>F</sup> <sup>|</sup><sup>l</sup> where (l) is a unit clause in <sup>F</sup>. The iterated application of the unit clause rule to a formula, until no unit clauses are left, is called *unit propagation*. If unit propagation yields the empty clause ⊥, we say that it derived a *conflict*. Given two clauses (<sup>l</sup> <sup>∨</sup> <sup>C</sup>) and (¯<sup>l</sup> <sup>∨</sup> <sup>D</sup>) their *resolvent* is <sup>C</sup> <sup>∨</sup> <sup>D</sup>. If further <sup>D</sup> <sup>⊆</sup> <sup>C</sup>, *self-subsuming literal elimination* (SSLE) allows removing <sup>l</sup> from (<sup>l</sup> <sup>∨</sup> <sup>C</sup>). Notice that <sup>C</sup> is the resolvent of (<sup>l</sup> <sup>∨</sup> <sup>C</sup>) and (¯<sup>l</sup> <sup>∨</sup> <sup>D</sup>). So an SSLE step can be seen as two operations, learning the resolvent C followed by the removal of (<sup>l</sup> <sup>∨</sup> <sup>C</sup>), which is subsumed by <sup>C</sup>. The reverse of SSLE is *self-subsuming literal addition* (SSLA), which can add a literal l to a clause C in the presence of a clause (¯<sup>l</sup> <sup>∨</sup> <sup>D</sup>) with <sup>D</sup> <sup>⊆</sup> <sup>C</sup>. The notion of SSLE first appeared in [10] and is a special case of *asymmetric literal elimination* (ALE), which in turn is the inverse of *asymmetric literal addition* (ALA) [16].

Clause <sup>C</sup> is *blocked* on literal <sup>l</sup> <sup>∈</sup> <sup>C</sup> w.r.t. a formula <sup>F</sup>, if all resolvents of <sup>C</sup> and <sup>D</sup> <sup>∈</sup> <sup>F</sup> with ¯<sup>l</sup> <sup>∈</sup> <sup>D</sup> are tautologies. If a clause <sup>C</sup> <sup>∈</sup> <sup>F</sup> is blocked w.r.t. <sup>F</sup>, <sup>C</sup> can be removed from <sup>F</sup> while preserving satisfiability. If a clause C /<sup>∈</sup> <sup>F</sup> is blocked w.r.t. F, then C can be added to F while preserving satisfiability.

*Formula Relations.* Two formulas are *logically equivalent* if they are satisfied by the same assignments. Two formulas are *satisfiability equivalent* if they are either both satisfiable or both unsatisfiable. Given two formulas F and F , we denote by F - F that F implies F , i.e., all assignments satisfying F also satisfy F . Furthermore, by <sup>F</sup> <sup>1</sup> <sup>F</sup> we denote that for every clause (l<sup>1</sup> ∨···∨ <sup>l</sup>*n*) <sup>∈</sup> <sup>F</sup> , unit propagation on <sup>F</sup> <sup>∧</sup>(¯l1)∧···∧(¯l*n*) derives a conflict. If <sup>F</sup> <sup>1</sup> F , we say that <sup>F</sup> implies <sup>F</sup> through unit propagation. For example, (x) <sup>∧</sup> (y) <sup>1</sup> (<sup>x</sup> <sup>∨</sup> <sup>z</sup>) <sup>∧</sup> (y), since unit propagation of the unit clauses (x) and (z) derives a conflict with (x), and unit propagation of (y) derives a conflict with (y).

### **3 Clausal Proof Systems**

In this section, we introduce a formal notion of clause redundancy and demonstrate how it provides the basis for clausal proof systems. We start by introducing clause redundancy [22]:

**Definition 1.** *A clause* <sup>C</sup> *is* redundant *w.r.t. a formula* <sup>F</sup> *if* <sup>F</sup> *and* <sup>F</sup> ∪ {C} *are satisfiability equivalent.*

For instance, the clause <sup>C</sup> = (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) is redundant w.r.t. <sup>F</sup> = (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) since <sup>F</sup> and <sup>F</sup> ∪ {C} are satisfiability equivalent (although they are not logically equivalent). Since this notion of redundancy allows us to add redundant clauses to a formula without affecting its satisfiability, it gives rise to clausal proof systems.

**Definition 2.** *For* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *<sup>a</sup>* derivation *of a formula* <sup>F</sup>*<sup>n</sup> from a formula* <sup>F</sup><sup>0</sup> *is a sequence of* n *triples* (d1, C1, ω1),...,(d*n*, C*n*, ω*n*)*, where each clause* C*<sup>i</sup> for* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> *is redundant w.r.t.* <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> \ {C*i*} *with* <sup>F</sup>*<sup>i</sup>* <sup>=</sup> <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> ∪ {C*i*} *if* <sup>d</sup>*<sup>i</sup>* = 0 *and* <sup>F</sup>*<sup>i</sup>* <sup>=</sup> <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> \ {C*i*} *if* <sup>d</sup>*<sup>i</sup>* = 1*. The assignment* <sup>ω</sup>*<sup>i</sup> acts as* (*arbitrary*) witness *of the redundancy of* <sup>C</sup>*<sup>i</sup> w.r.t.* <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> *and we call the number* <sup>n</sup> *of* steps *also the* length *of the derivation. A derivation is a* refutation *of* <sup>F</sup><sup>0</sup> *if* <sup>d</sup>*<sup>n</sup>* = 0 *and* <sup>C</sup>*<sup>n</sup>* <sup>=</sup> <sup>⊥</sup>*. A derivation is a* proof of satisfaction *of* F<sup>0</sup> *if* F*<sup>n</sup> equals the empty formula.*

If there exists such a derivation of a formula F from a formula F, then F and F are satisfiability equivalent. Further a refutation of a formula F, as defined above, obviously certifies the unsatisfiability of F since any F containing the empty clause is unsatisfiable. Note that at this point these ω*<sup>i</sup>* are still place-holders used in refinements, i.e., in the RAT and PR proof systems defined below, where these <sup>ω</sup>*<sup>i</sup>* are witnesses for the redundancy of <sup>C</sup>*<sup>i</sup>* w.r.t. <sup>F</sup>*<sup>i</sup>*−<sup>1</sup>. In these specialized proof systems this redundancy can be *checked efficiently*, i.e., in polynomial time w.r.t. the size of <sup>C</sup>*i*, <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> and <sup>ω</sup>*i*.

#### **3.1 The RAT Proof System**

The RAT proof system allows the addition of a redundant clause, which is a so-called *resolution asymmetric tautology* [21] (RAT, defined below). It can be efficiently checked whether a clause is a RAT. The following definition of RAT is equivalent to the original one in [21] based on resolvents using results from [17].

**Definition 3.** *Let* F *be a formula,* C *a clause, and* α *the smallest assignment that falsifies* C*. Then,* C *is a* resolution asymmetric tautology (RAT) *with respect to* <sup>F</sup> *if there exists a literal* <sup>l</sup> <sup>∈</sup> <sup>C</sup> *such that* <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup>α*l. We say that* <sup>C</sup> *is a* RAT *on* <sup>l</sup> *w.r.t.* <sup>F</sup>*. The empty clause* <sup>⊥</sup> *is a* RAT *w.r.t.* <sup>F</sup> *iff* <sup>F</sup> <sup>1</sup> ⊥*.*

Informally, <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup>α*<sup>l</sup>* means that <sup>F</sup> <sup>|</sup>α*<sup>l</sup>* is at least as satisfiable compared to <sup>F</sup> <sup>|</sup>α. We know that <sup>α</sup>*<sup>l</sup>* satisfies <sup>C</sup> as <sup>l</sup> <sup>∈</sup> <sup>C</sup>, thus <sup>F</sup> <sup>|</sup>α*<sup>l</sup>* = (<sup>F</sup> <sup>∧</sup>C)|α*l*. Hence, if F has a satisfying assignment β that falsifies C, which necessarily is an extension of <sup>α</sup>, then it also satisfies (F∧C)|α*l*, and thus there exists a satisfying assignment of F that satisfies C, obtained from β by flipping the assigned value of l.

*Example 1.* Let <sup>F</sup> = (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>z</sup>) and <sup>C</sup> = (<sup>x</sup> <sup>∨</sup> <sup>z</sup>). Then, <sup>α</sup> <sup>=</sup> x z is the smallest assignment that falsifies C. Observe that C is a RAT clause on literal <sup>x</sup> w.r.t. <sup>F</sup>. First, <sup>α</sup>*<sup>x</sup>* <sup>=</sup> x z. Now, consider <sup>F</sup> <sup>|</sup><sup>α</sup> = (y) and <sup>F</sup> <sup>|</sup>α*<sup>x</sup>* = (y). Clearly, unit propagation on <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>∧</sup> (y) derives a conflict, thus <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup>α*x*.

In a RAT derivation (d1, C1, ω1),...,(d*n*, C*n*, ω*n*) all d*i*'s are zero (additions). Let <sup>α</sup>*<sup>i</sup>* denote the smallest assignment that falsifies <sup>C</sup>*<sup>i</sup>* and let <sup>l</sup>*<sup>i</sup>* <sup>∈</sup> <sup>C</sup>*<sup>i</sup>* be a literal on which <sup>C</sup>*<sup>i</sup>* is a RAT on <sup>l</sup>*<sup>i</sup>* w.r.t <sup>F</sup>*<sup>i</sup>*−<sup>1</sup>. Each witness <sup>ω</sup>*<sup>i</sup>* in a RAT derivation equals (α*i*)*<sup>l</sup><sup>i</sup>* , which is obtained from α*<sup>i</sup>* by flipping the value of l*i*.

#### **3.2 The PR Proof System**

As discussed, addition of PR clauses (short for *propagation-redundant clauses*) to a formula can lead to short proofs for hard formulas without the introduction of new variables. Although PR as well as RAT clauses are not necessarily implied by the formula, their addition preserves satisfiability [17]. The intuitive reason for this is that the addition of a PR clause prunes the search space of possible assignments in such a way that there still remain assignments under which the formula is as satisfiable as under the pruned assignments.

**Definition 4.** *Let* F *be a formula,* C *a non-empty clause, and* α *the smallest assignment that falsifies* C*. Then,* C *is* propagation redundant (PR) *with respect to* <sup>F</sup> *if there exists an assignment* <sup>ω</sup> *which satisfies* <sup>C</sup>*, such that* <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup>ω*.*

The clause C can be seen as a constraint that "prunes" from the search space all assignments that extend α. Note again, that in our setting assignments are in general partial functions. Since <sup>F</sup> <sup>|</sup><sup>α</sup> implies <sup>F</sup> <sup>|</sup>ω, every assignment that satisfies <sup>F</sup> <sup>|</sup><sup>α</sup> also satisfies <sup>F</sup> <sup>|</sup>ω, meaning that <sup>F</sup> is at least as satisfiable under <sup>ω</sup> as it is under α. Moreover, since ω satisfies C, it must disagree with α on at least one variable. We refer to ω as the *witness*, since it witnesses the propagationredundancy of the clause. Consider the following example from [17].

*Example 2.* Let <sup>F</sup> = (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>y</sup>) <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>z</sup>), <sup>C</sup> = (x), and let <sup>ω</sup> <sup>=</sup> x z be an assignment. Then, α = x is the smallest assignment that falsifies C. Now, consider <sup>F</sup> <sup>|</sup><sup>α</sup> = (y) and <sup>F</sup> <sup>|</sup><sup>ω</sup> = (y). Clearly, unit propagation on <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>∧</sup> (y) derives a conflict. Thus, <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup><sup>ω</sup> and <sup>C</sup> is propagation redundant w.r.t. <sup>F</sup>. Notice that <sup>C</sup> is not RAT w.r.t <sup>F</sup> as (y) = <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>F</sup> <sup>|</sup>α*<sup>x</sup>* = (y)(z).

Most known types of redundant clauses are PR clauses [17], including *blocked clauses* [23], *set-blocked clauses* [22], *resolution asymmetric tautologies*, etc.

#### **3.3 The Power of Deletion**

The clausal proof system DRAT [29] is the de-facto standard for proofs of unsatisfiability (refutations) in practice. It extends RAT by allowing the deletion of clauses. The main purpose of clause deletion is to reduce computation cost to validate proofs of unsatisfiability. Note, that SAT solvers not only learn clauses, but also aggressively delete clauses to speed up reasoning. Integrating deletion information in proofs is crucial to speed up proof checking.

In principle, while deleted clause information has to be taken into account to update the formula after a deletion step, one does not need to check the validity of clause deletion steps in order to refute a propositional formula. Simply removing deleted clauses during proof checking trivially preserves unsatisfiability.

Proofs of satisfiability only exist in proof systems that allow and enforce valid deletion steps, because they are required to reduce a formula to the empty formula. In case of propositional formulas, the notion of proofs of satisfiability is probably not useful as a satisfying assignment can be used to certify satisfiability. However, for richer logics, such as quantified Boolean formulas, the proof of satisfiability can be exponentially smaller compared to alternatives [19,20].

#### **4 Conversion Algorithm**

This section presents our main algorithm, which describes how to convert a PR derivation (0, C1, ω1),...,(0, C*n*, ω*n*) of a formula F*<sup>n</sup>* from a formula F<sup>0</sup> into a DRAT derivation (d1, D1, ω <sup>1</sup>), ... , (d*m*, D*m*, ω *<sup>m</sup>*) of G*<sup>m</sup>* = F*<sup>n</sup>* from G<sup>0</sup> = F0. Each PR proof step adds a clause to the formula. Let G<sup>0</sup> be a copy of F<sup>0</sup> and <sup>F</sup>*<sup>i</sup>* := <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> <sup>∧</sup> <sup>C</sup>*<sup>i</sup>* for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. Each proof step in a DRAT proof either deletes or adds a clause depending on whether <sup>d</sup>*<sup>i</sup>* is 1 or 0 (respectively). For 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>m</sup> we either have <sup>G</sup>*<sup>i</sup>* := <sup>G</sup>*<sup>i</sup>*−<sup>1</sup> \ {D*i*} if <sup>d</sup>*<sup>i</sup>* is 1 or <sup>G</sup>*<sup>i</sup>* := <sup>G</sup>*<sup>i</sup>*−<sup>1</sup> <sup>∧</sup> <sup>D</sup>*<sup>i</sup>* if <sup>d</sup>*<sup>i</sup>* is 0.

Each single PR derivation step (0, C*i*, ω*i*) is also a PR derivation of F*<sup>i</sup>* from <sup>F</sup>*<sup>i</sup>*−<sup>1</sup> and our conversion algorithm simply translates each such PR derivation step separately into a DRAT derivation of <sup>F</sup>*<sup>i</sup>* from <sup>F</sup>*<sup>i</sup>*−<sup>1</sup>. The conversion of the whole PR derivation is then obtained as concatenation of these individual DRAT derivations, which gives a DRAT derivation of F*<sup>n</sup>* from F0. We will first offer an informal top-down description of converting a single PR derivation step into a sequence of DRAT steps.

.

#### **4.1 Top-Down**

Consider a formula F and a clause C which has PR w.r.t. F with witness ω, i.e., a single PR derivation step. The central question addressed in this paper is how to construct a DRAT derivation of <sup>F</sup> <sup>∧</sup><sup>C</sup> from <sup>F</sup>. The constructed DRAT derivation (d1, C1, ω1),...,(d*q*, C*q*, ω*q*),(d*q*+1, C*q*+1, ω*q*+1),...,(d*p*, C*p*, ω*p*) of <sup>F</sup> <sup>∧</sup><sup>C</sup> from <sup>F</sup> consists of three parts. It also requires to introduce a (new) Boolean variable x that does not occur in F.

	- b. there exists a DRAT derivation from <sup>F</sup> <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to <sup>F</sup> <sup>∧</sup> <sup>C</sup>.

Notice that (x∨C) is blocked w.r.t. <sup>F</sup> and could therefore be added to <sup>F</sup> as a first step. However, it is very hard to eliminate literal <sup>x</sup> from <sup>F</sup> <sup>∧</sup> (<sup>x</sup> <sup>∨</sup> <sup>C</sup>). Instead, we transform F into F , before the addition and reverse the transformation afterwards. Below we describe the details of our simulation algorithm in five phases of which phase (I) and (II) correspond to the transformation (part 1.) and phase (IV) and (V) corresponds to the reverse transformation (part 3.).

#### **4.2 Five Phases**

We will show a transformation of how F*i*+1 is derived from F*<sup>i</sup>* using PR step (0, C*i*+1, ω*i*+1) into a sequence of p DRAT proof steps from G*<sup>j</sup>* to G*j*+*<sup>p</sup>* such that G*<sup>j</sup>* = F*<sup>i</sup>* and G*j*+*<sup>p</sup>* = F*i*+1. In the description below, F refers to F*i*, C refers to C*i*+1, and ω refers to ω*i*+1. Further let x be a new Boolean variable, i.e., x does not occur in <sup>F</sup>. We can assume that *var* (C) <sup>⊆</sup> *var* (F). Otherwise there exists a literal <sup>l</sup> <sup>∈</sup> <sup>C</sup> and *var* (l) <sup>∈</sup>/ *var* (F). Thus <sup>C</sup> is blocked on <sup>l</sup> w.r.t. <sup>F</sup> and can be added to F using a single RAT step.


a clause containing ¯l. If it is a weakened clauses (<sup>x</sup> <sup>∨</sup> <sup>E</sup>) of <sup>E</sup> where <sup>E</sup> <sup>∈</sup> <sup>F</sup> is satisfied by ω, then x occurs in opposite phase and the resolvent is a tautology (same condition as for blocked clauses). Otherwise the resolvent on <sup>l</sup> of (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with the clause containing ¯<sup>l</sup> is subsumed by a clause (<sup>x</sup> <sup>∨</sup> <sup>D</sup>) with <sup>D</sup> <sup>∈</sup> <sup>F</sup> <sup>|</sup><sup>ω</sup> \ <sup>F</sup> added in first step above. The resulting formula, where all involved clauses in G(I) are weakened, is denoted by G(II).

(III) *Add the weakened* PR *clause.*

Add the clause (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to <sup>G</sup>(II), resulting in <sup>G</sup>(III). The key observation related to this phase is that (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) has RAT on <sup>x</sup> w.r.t. <sup>G</sup>(II): The only clauses in G(II) that contain literal x are the ones that were added in the first phase. We need to show that <sup>G</sup>(II) implies every clause (<sup>x</sup> <sup>∨</sup> <sup>C</sup> <sup>∨</sup> <sup>D</sup>) with <sup>D</sup> <sup>∈</sup> <sup>F</sup> <sup>|</sup><sup>ω</sup> \ <sup>F</sup> by unit propagation. Let <sup>α</sup> be the smallest assignment that falsifies C. Since C has PR w.r.t. F using witness ω, we know that <sup>F</sup> <sup>|</sup><sup>α</sup> <sup>1</sup> <sup>D</sup> with <sup>D</sup> <sup>∈</sup> <sup>F</sup> <sup>|</sup><sup>ω</sup> \ <sup>F</sup>. This is equivalent to <sup>F</sup> <sup>1</sup> (<sup>C</sup> <sup>∨</sup> <sup>D</sup>) with <sup>D</sup> <sup>∈</sup> <sup>F</sup> <sup>|</sup><sup>ω</sup> \ <sup>F</sup>. Furthermore <sup>G</sup>(II) <sup>|</sup><sup>x</sup> <sup>⊇</sup> <sup>F</sup>. Hence, <sup>G</sup>(II) <sup>|</sup><sup>x</sup> <sup>1</sup> (<sup>C</sup> <sup>∨</sup> <sup>D</sup>) or equivalently, <sup>G</sup>(II) <sup>1</sup> (<sup>x</sup> <sup>∨</sup> <sup>C</sup> <sup>∨</sup> <sup>D</sup>).

(IV) *Strengthen all weakened clauses.*

The fourth phase removes all occurrences of the literal x from clauses in <sup>G</sup>(III), thereby reversing the second phase *and* strengthening (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to C. This phase consists of three parts. First, we reintroduce the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>, or in clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup>. These clauses have RAT on <sup>l</sup> w.r.t. G(III) by the same reasoning used to remove them in the second phase above and in case (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) can be resolved on <sup>l</sup> with the only clause (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) added in the third phase, thus ¯<sup>l</sup> <sup>∈</sup> <sup>C</sup>, the resolvent is a tautology (contains <sup>x</sup> and <sup>x</sup>). Afterwards, we strengthen all clauses (<sup>x</sup> <sup>∨</sup> <sup>E</sup>) <sup>∈</sup> <sup>G</sup>(III) to <sup>E</sup> as follows. Note that this also strengthens clause (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to <sup>C</sup>. Observe that all clauses (x∨E) <sup>∈</sup> <sup>G</sup>(III) including (x∨C) are satisfied by <sup>ω</sup> and therefore there exists a clause (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>E</sup>. Self-subsuming literal elimination (SSLE) can now eliminate all literals <sup>x</sup>. Finally, the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> is no longer required. The clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> added twice already can be removed again since literal x has become pure due to the strengthening of all clauses containing literal x. The resulting formula obtained from G(III) by removing all occurrences of literal x is denoted by G(IV).

(V) *Remove the shortened copies.*

The fifth phase reverses the first phase, and actually uses the same argument as the fourth phase. All clauses in G(III) that contained a literal x were strengthened by removing these literals in phase four. As a consequence, the literal x is (still) pure in G(IV). The only clauses that still contain literal x are exactly the clauses that have been added in the first phase. Since they are all blocked on x w.r.t. G(IV), they can be eliminated, while preserving satisfiability. After removing these clauses we obtain G(V) which equals <sup>F</sup> <sup>∧</sup> <sup>C</sup>.

#### **4.3 Complexity**

In this section we analyze the worst case complexity of converting a PR derivation (0, C1, ω1),...,(0, C*n*, ω*n*) of a formula F*<sup>n</sup>* from a formula F<sup>0</sup> into a DRAT derivation (d1, D1, ω 1),...,(d1, D*m*, ω *<sup>m</sup>*) of G*<sup>m</sup>* = F*<sup>n</sup>* from G<sup>0</sup> = F<sup>0</sup> using the presented simulation algorithm. The number of DRAT steps that are required to simulate a single PR addition step depends on the size of the formula. Let <sup>N</sup> <sup>=</sup> <sup>|</sup>F*n*<sup>|</sup> be the number of clauses in the last <sup>F</sup>*<sup>n</sup>* and <sup>V</sup> <sup>=</sup> <sup>|</sup>*var* (F*n*)<sup>|</sup> the number of its variables. Since a PR derivation does not remove clauses, we have <sup>|</sup>F*i*<sup>|</sup> <sup>=</sup> <sup>|</sup>F*<sup>i</sup>*−<sup>1</sup><sup>|</sup> + 1 and <sup>|</sup>*var* (F*i*)|≥|*var* (F*<sup>i</sup>*−<sup>1</sup>)|. Therefore for <sup>i</sup> ∈ {1..n}, <sup>|</sup>F*i*| ≤ <sup>N</sup> and <sup>|</sup>*var* (F*i*)| ≤ <sup>V</sup> . In the analysis we ignore clause deletion, since the number of clause deletions is bounded by the number of added clauses.

In phase (I) of the conversion algorithm, copies of clauses that are reduced but not satisfied by ω*<sup>i</sup>* are added, while phase (II) clauses are weakened which are reduced and satisfied by ω*i*. Since a clause is either satisfied, not satisfied, or untouched by ω*i*, the sum of the number of copies and weakened clauses is at most <sup>|</sup>F*i*| ≤ <sup>N</sup>. Also the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>*<sup>i</sup>* is added in phase (II), meaning at most <sup>|</sup>*var* (ω*i*)|≤|*var* (F*i*)| ≤ <sup>V</sup> clause addition steps. Phase (III) adds a single clause. Phase (IV) adds again the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>*<sup>i</sup>* (at most <sup>V</sup> steps) and strengthens all weakened clauses (at most N steps). Phase (V) only deletes clauses. Thus the total number of clause additions for all phases in the conversion of a single PR step is bounded by 2V + 2N + 1.

There are <sup>n</sup> <sup>≤</sup> <sup>N</sup> additions in the PR proof and for each addition we apply the conversion algorithm. Hence the total number of clause addition steps in the DRAT derivation is at most 2NV +2N<sup>2</sup>+N. Since <sup>V</sup> <sup>≤</sup> <sup>N</sup> for any interesting PR derivation, the number of steps in the resulting DRAT derivation is in <sup>O</sup>(N<sup>2</sup>).

#### **5 Optimizations**

The simulation algorithm described in the prior section was designed to result in compact DRAT derivations using a single new variable, while focussing on converting any PR derivation into a DRAT derivation. The algorithm can be further optimized to reduce the size of the resulting DRAT derivations.

#### **5.1 Refutations**

In practice, most PR derivations are refutations, i.e., they include adding the empty clause. When converting PR refutations, one can ignore the justification of any weakening steps as such steps trivially preserve unsatisfiability. The only weakening steps in the simulation algorithm are performed in phase (II). The purpose of the addition of the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> in phase (II) is to allow the weakening via self-subsuming literal addition (SSLA). This justification is no longer required for PR refutations. Without the addition of <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>, one can also discard its removal. So both the first and third part of phase (II) can be omitted.

#### **5.2 Witness Minimization**

In some situations, only a subset of the involved clauses needs to be weakened (phase (II)) and later strengthened (phase (IV)). Weakening of involved clauses is required to make sure that the clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> are RAT on <sup>l</sup> w.r.t. G(III) in phase (IV) of the simulation algorithm. However, some of the clauses (x∨l) may be unit implied by others (and do not require to be a RAT on l). This situation occurs when a subset of the witness implies the full witness via unit propagation. We minimize the witness by searching for the smallest witness <sup>ω</sup> <sup>⊆</sup> <sup>ω</sup> such that <sup>ω</sup> implies <sup>ω</sup> via unit propagation. Only clauses reduced by <sup>ω</sup> and satisfied by ω need to be weakened in phase (II) and strengthened in (IV).

#### **5.3 Avoiding Copying**

In some quite specific case, one can avoid copying the clauses that are reduced, but not satisfied by the witness altogether. In other words skip phase (I) and (V) of the simulation algorithm. This case, however, occurred frequently in our PR proofs. Let α denote the smallest assignment that falsifies the PR clause C to be added. Furthermore, let ω be the witness and ω the minimized witness as discussed above. The condition for avoiding clause copying consists of two parts. First, there is no literal <sup>l</sup> <sup>∈</sup> <sup>α</sup> such that ¯<sup>l</sup> <sup>∈</sup> <sup>ω</sup> . Recall that there always exists a literal <sup>l</sup> <sup>∈</sup> <sup>α</sup> such that ¯<sup>l</sup> <sup>∈</sup> <sup>ω</sup>. So witness minimization is necessary. Second, for each literal <sup>l</sup> <sup>∈</sup> <sup>ω</sup> , the unit clause (l) should be a RAT on l w.r.t. the current formula without the involved clauses under α. Although both conditions are very restrictive, they apply often in the PR proofs used in the evaluation.

Basically, this optimization removes phases (I) and (V), and modifies (II), (III), and (IV). The modified phases are named phase (i), (ii), and (iii), resp.

	- Clause <sup>E</sup> <sup>∈</sup> <sup>F</sup> is called *involved* if it is reduced by the reduced witness <sup>ω</sup> and satisfied by the original ω. The first phase weakens all involved clauses <sup>E</sup> to (<sup>x</sup> <sup>∨</sup> <sup>E</sup>) as follows. First, we add the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> <sup>∪</sup> <sup>α</sup>, or in clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> <sup>∪</sup> <sup>α</sup>. These clauses are blocked because <sup>G</sup> does not contain clauses with literal x. Now we can weaken the involved clauses using SSLA. Then we remove the implication part <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> , but keep <sup>x</sup> <sup>⇒</sup> <sup>α</sup>. When adding this implication, the clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> were blocked on x. Now we can remove them, because they have RAT on l as all clauses containing ¯l have been either weakened (if they were satisfied by ω) or are implied by α by the second condition. The resulting formula, G in which all involved clauses are weakened and includes <sup>x</sup> <sup>⇒</sup> <sup>α</sup>, is denoted by <sup>G</sup>(i).

#### (iii) *Strengthen all weakened clauses.*

The third phase removes all occurrences of the literal x from clauses in <sup>G</sup>(ii), thereby reversing the second phase *and* strengthening (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to <sup>C</sup>. This phase consists of four parts. First, we reintroduce the implication part <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> , or in clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> . Again, these clauses have RAT on <sup>l</sup> w.r.t. <sup>G</sup>(ii). Second, we remove the implication part <sup>x</sup> <sup>⇒</sup> <sup>α</sup>, i.e. the clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>α</sup>. Afterwards, we strengthen (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) to <sup>C</sup> and all clauses (<sup>x</sup> <sup>∨</sup> <sup>E</sup>) <sup>∈</sup> <sup>G</sup>(ii) to <sup>E</sup>. Observe that all clauses (<sup>x</sup> <sup>∨</sup> <sup>E</sup>) <sup>∈</sup> <sup>G</sup>(ii) including (<sup>x</sup> <sup>∨</sup> <sup>C</sup>) are satisfied by <sup>ω</sup> and therefore there exists a clause (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>E</sup>. SSLE can therefore remove all literals <sup>x</sup>. Finally, the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> is no longer required. The clauses (x∨l) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> can be eliminated because literal x has become pure due to the strengthening of all clauses containing literal x. The resulting formula, i.e., G(ii) after removing all occurrences of literal <sup>x</sup>, is denoted by <sup>G</sup>(iii) and equals <sup>G</sup> <sup>∧</sup> <sup>C</sup>.

In case the PR derivation is a refutation, we can further optimize this case, by changing phase (i) as follows: Instead of adding the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> <sup>∪</sup> <sup>α</sup>, the implication <sup>x</sup> <sup>⇒</sup> <sup>α</sup> is added. Without the addition of the implication part <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> , we can also discard removing that part at the end of phase (i).

#### **6 Alternative Simulation Algorithms**

Even though the conversion from PR derivations to DRAT derivations is arguably the most useful one in practice, one can also consider the following alternatives.

#### **6.1 Limiting the Number of RAT Steps**

Most steps in the simulation algorithm are "basic" steps, i.e., self-subsuming literal addition or elimination and blocked clause addition or elimination. There are only few "full" RAT addition steps: The removal of the implication in phase (II), the addition of the weakened PR clause in phase (III) and the addition of the implication in phase (IV). It is interesting to explore the option to reduce the number of these "full" RAT addition steps. Eliminating "full" RAT addition steps brings us close to a simulation algorithm with only basic steps.

It is easy to eliminate all but one "full" RAT addition step. In order to eliminate the RAT steps in phase (II), one can weaken the clauses (i.e., add a literal x using SSLA) that are reduced but not satisfied by the witness using the shortened copies of clauses that are reduced, but not satisfied by ω. After the weakening, we can remove the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> using blocked clause elimination (instead of RAT), because now all clauses that are touched by ω have a literal x. Therefore all clauses (<sup>x</sup> <sup>∨</sup> <sup>l</sup>) with <sup>l</sup> <sup>∈</sup> <sup>ω</sup> are blocked on <sup>l</sup>. The weakening also allows adding the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> in phase (IV) using blocked clause addition steps (instead of RAT). The strengthening of the newly weakened clause can be performed in phase (IV) using SSLE (after adding the implication). It is not obvious how to replace the only remaining RAT addition in phase (III) using basic steps.

#### **6.2 Converting DPR Proofs into DRAT Proofs**

So far we only considered converting a PR clause addition as a sequence of DRAT steps and ignored deletion of PR clauses from a formula. In most cases, clause deletion steps in a proof facilitate more efficient checking of a proof of unsatisfiability and can therefore be deleted without any checking. However, there are situations in which one wants to check the validity of clause deletion steps. In particular for proofs of satisfiability, i.e., a sequence of proof steps that show that a given formula is equivalent to the empty formula and thus satisfiable.

The DPR proof system is a clausal proof system that allows the addition and deletion of PR clauses. Conversion of a PR clause addition step into DRAT proof steps is equivalent to the conversion of such a step in the PR proof system. The conversion of a PR clause deletion step is slightly different. Given a formula <sup>F</sup> and a clause <sup>C</sup> <sup>∈</sup> <sup>F</sup>, which is a PR clause w.r.t. <sup>F</sup> with witness <sup>ω</sup>. The first phase of the conversion is exactly the same as phase (I) of the PR clause addition conversion. The second phase of the conversion is slightly different compared to phase (II) of the PR clause addition conversion: Instead of weakening all clauses reduced and satisfied by ω, we weaken all clauses satisfied by ω. Notice that this includes weakening <sup>C</sup> to (x∨C). The third phase consists of deleting (x∨C) from the current formula. Recall that phase (III) of the PR clause addition conversion added (<sup>x</sup> <sup>∨</sup> <sup>C</sup>). The final phase corresponds to phases (IV) and (V).

#### **6.3 Converting PR Refutations into RAT Refutations**

The presented simulation algorithm converts PR derivations into DRAT derivations. We selected the DRAT proof system as target, because it is the most widely-supported proof system by top-tier SAT solvers and it allows step-wise simulation using deletion steps. The question arises whether deletion steps are required when converting a PR refutation. In short, the answer is *no* when allowing the introduction of arbitrary many new Boolean variables. Converting a deletion step can be realized as follows. Let C be the clause that is deleted from a formula <sup>F</sup>. For each <sup>x</sup> <sup>∈</sup> *var* (C), add to <sup>F</sup> the equivalence <sup>x</sup> <sup>⇔</sup> <sup>x</sup> with <sup>x</sup> being a new variable. Afterwards, copy all clauses in F —apart from C— that contain at least one literal <sup>l</sup> with *var* (l) <sup>∈</sup> *var* (C) using the new <sup>x</sup> variables instead of the old x variables. Finally replace all occurrences of old literals x and x in the remaining proof by literals x and x , respectively.

In order to limit the number of copy operations, one can group (consecutive) deletion steps and use the same variables x for the group. The simulation algorithm can be partitioned into two groups of (consecutive) clause addition steps that are followed each by groups of consecutive clause deletion steps: The first group of addition steps consists of phase (I) and the first half of phase (II), i.e., adding the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> and the weakened involved clauses. The first group of deletion steps consists of the remaining part of phase (II), i.e., deletion of the involved clauses and deletion of the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>. The second group of consecutive addition steps consists of phase (III) and the first half of phase (IV), i.e, adding the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup> and adding back the involved clauses. The second group of consecutive deletion steps consists of the remaining part of phase (IV), i.e., removal of the weakened involved clauses and the implication <sup>x</sup> <sup>⇒</sup> <sup>ω</sup>, and phase (V). By grouping the deletion steps, one can convert PR refutations into RAT refutations with at most a quadratic blowup, so the same worst case complexity as converting PR derivations into DRAT derivations.

### **7 Evaluation**

We implemented a tool, called PR2DRAT, to convert PR proofs into DRAT proofs<sup>1</sup> and evaluated the tool on short PR proofs for hard formulas from three families:

(1) pigeon-hole, (2) two-pigeons-per-hole [2], and (3) Tseitin formulas [4,27].

Every resolution proof of a formula in these families is exponential in the size of the formula [11,28]. As a consequence, any CDCL solver without dedicated special reasoning techniques, such as cardinality or XOR reasoning, is unable to solve these benchmarks in reasonable time. In contrast, our PR proofs are smaller than the formulas, so linear in size. The PR proofs of the pigeon-hole formulas and two-pigeons-per-hole formulas have been constructed manually in earlier work [17]. The proofs of the Tseitin formulas have been manually constructed by expressing Gaussian elimination in the PR system. Applying Gaussian elimination —after syntactically extracting XOR constraints from the CNF formulas is enough to solve these formulas. We will first evaluate the size of the conversion. Afterwards we certify for the first time the short PR proofs by converting them into DRAT proofs which are checked by a formally verified checker.

#### **7.1 Proof Simulation and Optimization**

We will compare three kinds of DRAT proofs for the benchmarks used in the experiments: the most compact existing ones [14,15], the proofs obtained from using our plain conversion algorithm, and the proofs obtained from our optimized algorithm. The most compact existing ones originate from expressing symmetry breaking as DRAT proof steps. Table 1 shows the comparison. All proofs have been trimmed using the DRAT-trim tool [12] once. Applying DRAT-trim multiple rounds (using the output proof as input proof for the next round) allows further reduction of the proof size, but typically these extra reductions are small.

For pigeon-hole formulas over n pigeons, the most compact existing proofs have <sup>O</sup>(n<sup>4</sup>) proof steps. This is also the case for the DRAT proofs obtained through our basic conversion algorithm as well as for the extended resolution proofs by Cook [5]. However, DRAT proofs obtained with our optimized algorithm have only <sup>O</sup>(n<sup>3</sup>) proof steps. Notice that the size of pigeon-hole formulas as well as the size of PR proofs are both in <sup>O</sup>(n<sup>3</sup>). In other words, our optimized conversion algorithm cannot only produce DRAT proofs, but for pigeon-hole formulas it generates the first DRAT proofs of linear size.

<sup>1</sup> The tool, checkers, formulas, and proofs discussed in this section are available at http://www.cs.utexas.edu/∼marijn/pr2drat/.


**Table 1.** Comparison of the size of trimmed, generated DRAT proofs for hard formulas. The size of proofs is measured in the number of clause addition steps (#add). We denote with "—" that no DRAT proof is available. Bold is used for the smallest DRAT proofs.

The results for the two-pigeons-per-hole formulas are similar, but more pronounced: There exist only DRAT proofs of the formulas up to 12 holes and 25 pigeons (tph12) [15]. Our plain simulation algorithm can produce DRAT proofs of the formulas up to 20 holes and 41 pigeons (tph20). Moreover, our optimized simulation algorithm is able to produce proofs that are linear in size of the formulas, although not linear in the size of the PR proofs.

We are unaware of any DRAT proofs of hard Tseitin formulas, e.g., from the Urquhart-s5-b\* family [4], nor of any tool able to produce such DRAT proofs. However, we succeeded to manually produce short PR proofs without new variables for these formulas and convert them into DRAT proofs. The resulting DRAT proofs, with and without optimizations, are relatively large compared to the PR proofs. The blowup is close to the quadratic worse case. We observed that DRAT-trim was able to remove many (around 70%) of clause additions, which suggests that there could be an optimization to generate shorter DRAT proofs.

#### **7.2 Verified PR Proof Checking**

Our proof simulation approach can be used to validate PR proofs with formally verified tools and thereby increasing the confidence in their correctness. The tool chain works as follows: Given a formula F and an alleged PR proof PPR of F, our tool PR2DRAT converts PPR into a DRAT proof PDRAT. Afterwards, we use the DRAT-trim tool to convert PDRAT into a CLRAT (compressed linear RAT) proof PCLRAT. CLRAT proofs can be efficiently checked using formally verified checkers [7]. We used the verified checker ACL2check [13] to certify that PCLRAT is a valid proof of unsatisfiability of F. Notice that the tools PR2DRAT and DRAT-trim are unverified and thus may turn an invalid proof into a valid proof or vice versa.

Figure 1 shows the results of applying this tool chain on the benchmark suite. The PR2DRAT tool was able to convert each PR proof into a DRAT proof in less

**Fig. 1.** Certification of PR proofs using PR2DRAT, DRAT-trim, and the formally verified checker ACL2check. Left the sizes of proofs in the PR, DRAT, and CLRAT formats are shown in bytes and right the proof conversion and checking times are in seconds. No times are shown for the Urquhart instances as all times were less than a second.

than a minute and half of the proofs in less than a second. The runtimes of DRAT-trim and ACL2check are one to two orders of magnitude higher than for PR2DRAT. Thus our tool adds little overhead to the tool chain. The sizes of the DRAT and CLRAT proofs are comparable. However, these proofs are different: DRAT-trim (A) removes redundant clause additions; (B) includes hints to speedup verified checking; (C) compresses proofs. The effect of (A) depends on proof quality; (B) increases the size of proofs of small hard problems by roughly a factor of four; (C) reduces size to 30% of the uncompressed proofs. The difference between the DRAT and CLRAT proofs therefore indicate how much redundancy was removed: for pigeon-hole proofs hardly anything, for two-pigeons-per-hole proofs a modest amount, and for Tseitin proofs a lot. Notice that runtimes of the verified checker ACL2check are comparable to the C-based checker DRAT-trim.

#### **8 Conclusions and Future Work**

We showed how to convert PR proofs into DRAT proofs using only a single new variable with an at most quadratic blowup in proof size. This result suggests that it might also be possible to construct DRAT proofs without new variables using one variable elimination step and reusing the eliminated variable. The optimizations implemented in our conversion tool PR2DRAT made it possible to produce DRAT proofs for hard problems that are significantly smaller compared to existing DRAT proofs of those problems. The main open question is whether PR proofs can be converted into RAT proofs (i.e., not allowing the deletion steps) with a small number of new variables. Without deletion steps, it seems that copying the formula using new variables is required.

Our new tool chain for certifying SAT solving results using PR proofs consists of four steps: proof production (solving), conversion from PR to DRAT, conversion from DRAT to CLRAT, and validation of the CLRAT proof using a formally verified checker. In order to fasten adaptation of the approach, we are exploring elimination of the second step, by integrating the conversion algorithm in either SAT solvers or in DRAT proof checkers.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Abstraction Refinement for Emptiness Checking of Alternating Data Automata**

Radu Iosif(B) and Xiao Xu

CNRS, Verimag, Universit´e de Grenoble Alpes, Grenoble, France {Radu.Iosif,Xiao.Xu}@univ-grenoble-alpes.fr

**Abstract.** Alternating automata have been widely used to model and verify systems that handle data from finite domains, such as communication protocols or hardware. The main advantage of the alternating model of computation is that complementation is possible in linear time, thus allowing to concisely encode trace inclusion problems that occur often in verification. In this paper we consider alternating automata over infinite alphabets, whose transition rules are formulae in a combined theory of Booleans and some infinite data domain, that relate past and current values of the data variables. The data theory is not fixed, but rather it is a parameter of the class. We show that union, intersection and complementation are possible in linear time in this model and, though the emptiness problem is undecidable, we provide two efficient semi-algorithms, inspired by two state-of-the-art abstraction refinement model checking methods: lazy predicate abstraction [8] and the Impact semi-algorithm [17]. We have implemented both methods and report the results of an experimental comparison.

### **1 Introduction**

The language inclusion problem is recognized as being central to verification of hardware, communication protocols and software systems. A property is a specification of the correct executions of a system, given as a set P of executions, and the verification problem asks if the set S of executions of the system under consideration is contained within P. This problem is at the core of widespread verification techniques, such as automata-theoretic model checking [23], where systems are specified as finite-state automata and properties defined using Linear Temporal Logic [21]. However the bottleneck of this and other related verification techniques is the intractability of language inclusion (PSPACE-complete for finite-state automata over finite alphabets).

Alternation [3] was introduced as a generalization of nondeterminism, introducing universal, in addition to existential transitions. For automata over finite alphabets, the language inclusion problem can be encoded as the emptiness problem of an alternating automaton of linear size. Moreover, efficient exploration techniques based on antichains are shown to perform well for alternating automata over finite alphabets [5].

Using finite alphabets for the specification of properties and models is however very restrictive, when dealing with real-life computer systems, mostly because of the following reasons. On one hand, programs handle data from very large domains, that can be assumed to be infinite (64-bit integers, floating point numbers, strings of characters, etc.) and their correctness must be specified in terms of the data values. On the other hand, systems must respond to strict deadlines, which requires temporal specifications as timed languages [1].

Although being convenient specification tools, automata over infinite alphabets lack the decidability properties ensured by finite alphabets. In general, when considering infinite data as part of the input alphabet, language inclusion is undecidable and, even complementation becomes impossible, for instance, for timed automata [1] or finite-memory register automata [13]. One can recover theoretical decidability, by restricting the number of variables (clocks) in timed automata to one [20], or forbidding relations between current and past/future values, as with symbolic automata [24]. In such cases, also the emptiness problem for the alternating versions becomes decidable [4,14].

In this paper, we present a new model of alternating automata over infinite alphabets consisting of pairs (a, ν) where a is an input event from a finite set and ν is a valuation of a finite set **x** of variables that range over an infinite domain. We assume that, at all times, the successive values taken by the variables in **x** are an observable part of the language, in other words, there are no hidden variables in our model. The transition rules are specified by a set of formulae, in a combined first-order theory of Boolean control states and data, that relate past with present values of the variables. We do not fix the data theory a priori, but rather consider it to be a parameter of the class.

A run over an input word (a1, ν1)...(an, νn) is a sequence <sup>φ</sup>0(**x**0) <sup>⇒</sup> <sup>φ</sup>1(**x**0, **<sup>x</sup>**1) <sup>⇒</sup> ... <sup>⇒</sup> <sup>φ</sup>n(**x**0,..., **<sup>x</sup>**n) of rewritings of the initial formula by substituting Boolean states with time-stamped transition rules. The word is accepted if the final formula <sup>φ</sup>n(**x**0,..., **<sup>x</sup>**n) holds, when all time-stamped variables **<sup>x</sup>**1,..., **<sup>x</sup>**n are substituted by their values in <sup>ν</sup>1,...,νn, all non-final states replaced by false and all final states by true.

The Boolean operations of union, intersection and complement can be implemented in linear time in this model, thus matching the complexity of performing these operations in the finite-alphabet case. The price to be paid is that emptiness becomes undecidable, for which reason we provide two efficient semi-algorithms for emptiness, based on lazy predicate abstraction [8] and the Impact method [17]. These algorithms are proven to terminate and return a word from the language of the automaton, if one exists, but termination is not guaranteed when the language is empty.

We have implemented the Boolean operations and emptiness checking semialgorithms and carried out experiments with examples taken from array logics [2], timed automata [9], communication protocols [25] and hardware verification [22].

**Related Work.** Data languages and automata have been defined previously, in a classical nondeterministic setting. For instance, Kaminski and Francez [13] consider languages, over an infinite alphabet of data, recognized by automata with a finite number of registers, that store the input data and compare it using equality. Just as the timed languages recognized by timed automata [1], these languages, called quasi-regular, are not closed under complement, but their emptiness is decidable. The impossibility of complementation here is caused by the use of hidden variables, which we do not allow. Emptiness is however undecidable in our case, mainly because counting (incrementing and comparing to a constant) data values is allowed, in many data theories.

Another related model is that of predicate automata [6], which recognize languages over integer data by labeling the words with conjunctions of uninterpreted predicates. We intend to explore further the connection with our model of alternating data automata, in order to apply our method to the verification of parallel programs.

The model presented in this paper stems from the language inclusion problem considered in [11]. There we provide a semi-algorithm for inclusion of data languages, based on an exponential determinization procedure and an abstraction refinement loop using lazy predicate abstraction [8]. In this work we consider the full model of alternation and rely entirely on the ability of SMT solvers to produce interpolants in the combined theory of Booleans and data. Since determinisation is not needed and complementation is possible in linear time, the bulk of the work is carried out by the solver.

The emptiness check for alternating data automata adapts similar semialgorithms for nondeterministic infinite-state programs to the alternating model of computation. In particular, we considered the state-of-the-art Impact procedure [17] that is shown to outperform lazy predicate abstraction [8] in the nondeterministic case, and generalized it to cope with alternation. More recent approaches for interpolant-based abstraction refinement target Horn systems [10,18], used to encode recursive and concurrent programs [7]. However, the emptiness of alternating word automata cannot be directly encoded using Horn clauses, because all the branches of the computation synchronize on the same input, which cannot be encoded by a finite number of local (equality) constraints. We believe that the lazy annotation techniques for Horn clauses are suited for branching computations, which we intend to consider in a future tree automata setting.

#### **2 Preliminaries**

A *signature* S = (S<sup>s</sup>, S<sup>f</sup> ) consists of a set S<sup>s</sup> of *sort symbols* and a set S<sup>f</sup> of sorted *function symbols*. To simplify the presentation, we assume w.l.o.g. that <sup>S</sup><sup>s</sup> <sup>=</sup> {Data,Bool}<sup>1</sup> and each function symbol <sup>f</sup> <sup>∈</sup> <sup>S</sup><sup>f</sup> has #(f) <sup>≥</sup> 0 arguments of sort Data and return value <sup>σ</sup>(f) <sup>∈</sup> <sup>S</sup><sup>s</sup>. If #(f) = 0 then <sup>f</sup> is a *constant*. We consider constants and ⊥ of sort Bool.

<sup>1</sup> The generalization to more than two sorts is without difficulty, but would unnecessarily clutter the technical presentation.

Let Var be an infinite countable set of *variables*, where each x ∈ Var has an associated sort σ(x). A *term* t of sort σ(t) = S is a variable x ∈ Var where <sup>σ</sup>(x) = <sup>S</sup>, or <sup>f</sup>(t1,...,t#(f)) where <sup>t</sup>1,...,t#(f) are terms of sort Data and σ(f) = S. An *atom* is a term of sort Bool or an equality t ≈ s between two terms of sort Data. A *formula* is an existentially quantified combination of atoms using disjunction ∨, conjunction ∧ and negation ¬ and we write φ → ψ for ¬φ ∨ ψ.

We denote by FVσ(φ) the set of free variables of sort σ in φ and write FV(φ) for - σ∈S<sup>s</sup> FVσ(φ). For a variable <sup>x</sup> <sup>∈</sup> FV(φ) and a term <sup>t</sup> such that <sup>σ</sup>(t) = <sup>σ</sup>(x), let φ[t/x] be the result of replacing each occurrence of x by t. For indexed sets **<sup>t</sup>** <sup>=</sup> {t1,...,tn} and **<sup>x</sup>** <sup>=</sup> {x1,...,xn}, we write <sup>φ</sup>[**t**/**x**] for the formula obtained by simultaneously replacing <sup>x</sup>i with <sup>t</sup>i in <sup>φ</sup>, for all <sup>i</sup> <sup>∈</sup> [1, n]. The size <sup>|</sup>φ<sup>|</sup> is the number of symbols occuring in φ.

An *interpretation* I maps (1) the sort Data into a non-empty set DataI, (2) the sort Bool into the set <sup>B</sup> <sup>=</sup> {true, false}, where <sup>I</sup> <sup>=</sup> true, <sup>⊥</sup><sup>I</sup> <sup>=</sup> false, and (3) each function symbol <sup>f</sup> into a total function <sup>f</sup><sup>I</sup> : (Data<sup>I</sup>)#(f) <sup>→</sup> <sup>σ</sup>(f)<sup>I</sup> , or an element of σ(f)<sup>I</sup> when #(f) = 0. Given an interpretation I, a *valuation* ν maps each variable x ∈ Var into an element ν(x) ∈ σ(x)I. For a term t, we denote by t I ν the value obtained by replacing each function symbol <sup>f</sup> by its interpretation f<sup>I</sup> and each variable x by its valuation ν(x). For a formula φ, we write I, ν |= φ if the formula obtained by replacing each term t in φ by the value t I ν is logically equivalent to true.

A formula φ is *satisfiable* in the interpretation I if there exists a valuation ν such that <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>φ</sup>, and *valid* if <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>φ</sup> for all valuations <sup>ν</sup>. The *theory* <sup>T</sup>(S, <sup>I</sup>) is the set of valid formulae written in the signature S, with the interpretation I. <sup>A</sup> *decision procedure* for <sup>T</sup>(S, <sup>I</sup>) is an algorithm that takes a formula <sup>φ</sup> in the signature <sup>S</sup> and returns yes iff <sup>φ</sup> <sup>∈</sup> <sup>T</sup>(S, <sup>I</sup>).

Given formulae ϕ and ψ, we say that φ *entails* ψ, denoted φ |=<sup>I</sup> ψ iff I, ν |= ϕ implies I, ν |= ψ, for each valuation ν, and φ ⇔<sup>I</sup> ψ iff φ |=<sup>I</sup> ψ and ψ |=<sup>I</sup> φ. We omit mentioning the interpretation I when it is clear from the context.

### **3 Alternating Data Automata**

In the rest of this section we fix an interpretation I and a finite alphabet Σ of *input events*. Given a finite set **x** ⊂ Var of variables of sort Data, let **x** → Data<sup>I</sup> be the set of valuations of the variables **x** and Σ[**x**] = Σ × (**x** → DataI) be the set of *data symbols*. A *data word* (word in the sequel) is a finite sequence (a1, ν1)(a2, ν2)...(an, νn) of data symbols, where <sup>a</sup>1,...,an <sup>∈</sup> <sup>Σ</sup> and <sup>ν</sup>1,...,νn : **x** → Data<sup>I</sup> are valuations. We denote by ε the empty sequence, by Σ<sup>∗</sup> the set of finite sequences of input events and by Σ[**x**] <sup>∗</sup> the set of data words over **x**.

This definition generalizes the classical notion of words from a finite alphabet to the possibly infinite alphabet Σ[**x**]. Clearly, when Data<sup>I</sup> is sufficiently large or infinite, we can map the elements of Σ into designated elements of Data<sup>I</sup> and use a special variable to encode the input events. However, keeping Σ explicit in the following simplifies several technical points below, without cluttering the presentation.

Given sets of variables **b**, **x** ⊂ Var of sort Bool and Data, respectively, we denote by Form(**b**, **<sup>x</sup>**) the set of formulae <sup>φ</sup> such that FVBool(φ) <sup>⊆</sup> **<sup>b</sup>** and FVData(φ) <sup>⊆</sup> **<sup>x</sup>**. By Form<sup>+</sup>(**b**, **<sup>x</sup>**) we denote the set of formulae from Form(**b**, **<sup>x</sup>**) in which each Boolean variable occurs under an even number of negations.

An *alternating data automaton* (ADA or automaton in the sequel) is a tuple A = **x**, Q, ι, F, Δ, where:


In each formula Δ(q, a) describing a transition rule, the variables **x** track the previous and **x** the current values of the variables of A. Observe that the initial values of the variables are left unconstrained, as the initial configuration does not contain free data variables. The size of A is defined as |A| = |ι| + (q,a)∈Q×Σ <sup>|</sup>Δ(q, a)|.

**Fig. 1.** Alternating data automaton example

*Example.* Figure 1(a) depicts an ADA with input alphabet Σ = {a, b}, variables **x** = {x, y}, states Q = {q0, q1, q2, q3, q4}, initial configuration q0, final states F = {q3, q4} and transitions given in Fig. 1(b), where missing rules, such as Δ(q0, b), are assumed to be ⊥. Rules Δ(q0, a) and Δ(q1, a) are universal and there are no existential nondeterministic rules. Rules Δ(q1, a) and Δ(q2, a) compare past (x, y) with present (x, y) values, Δ(q0, a) constrains the present and Δ(q1, b), Δ(q2, b) the past values, respectively.

Formally, let **<sup>x</sup>**k <sup>=</sup> {xk <sup>|</sup> <sup>x</sup> <sup>∈</sup> **<sup>x</sup>**}, for any <sup>k</sup> <sup>≥</sup> 0, be a set of time-stamped variables. For an input event a ∈ Σ and a formula φ, we write Δ(φ, a) (respectively Δk(φ, a)) for the formula obtained from φ by simultaneously replacing each state <sup>q</sup> <sup>∈</sup> FVBool(φ) by the formula <sup>Δ</sup>(q, a) (respectively <sup>Δ</sup>(q, a)[**x**k/**x**, **<sup>x</sup>**k+1/**x**], for <sup>k</sup> <sup>≥</sup> 0). Given a word <sup>w</sup> = (a1, ν1)(a2, ν2)...(an, νn), the *run* of <sup>A</sup> over <sup>w</sup> is the sequence of formulae:

$$
\phi\_0(Q) \Rightarrow \phi\_1(Q, \mathbf{x}\_0 \cup \mathbf{x}\_1) \Rightarrow \dots \Rightarrow \phi\_n(Q, \mathbf{x}\_0 \cup \dots \cup \mathbf{x}\_n)
$$

where <sup>φ</sup><sup>0</sup> <sup>≡</sup> <sup>ι</sup> and, for all <sup>k</sup> <sup>∈</sup> [1, n], we have <sup>φ</sup>k <sup>≡</sup> <sup>Δ</sup>k(φk−<sup>1</sup>, ak). Next, we slightly abuse notation and write <sup>Δ</sup>(ι, a1,...,an) for the formula <sup>φ</sup>n(**x**0,..., **<sup>x</sup>**n) above. We say that <sup>A</sup> *accepts* <sup>w</sup> iff <sup>I</sup>, ν <sup>|</sup><sup>=</sup> <sup>Δ</sup>(ι, a1,...,an), for some valuation <sup>ν</sup> that maps:(1) each <sup>x</sup> <sup>∈</sup> **<sup>x</sup>**k to <sup>ν</sup>k(x), for all <sup>k</sup> <sup>∈</sup> [1, n], (2) each <sup>q</sup> <sup>∈</sup> FVBool(φn) <sup>∩</sup> <sup>F</sup> to and (3) each <sup>q</sup> <sup>∈</sup> FVBool(φn) \ <sup>F</sup> to <sup>⊥</sup>. The language of <sup>A</sup> is the set <sup>L</sup>(A) of words from Σ[**x**] <sup>∗</sup> accepted by A.

*Example.* The following sequence is a non-accepting run of the ADA from Fig. 1 on the word (a,0, <sup>0</sup>),(a,1, <sup>1</sup>),(b,2, <sup>1</sup>), where Data<sup>I</sup> <sup>=</sup> <sup>Z</sup> and the function symbols have standard arithmetic interpretation:

In this paper we tackle the following problems:

1. *Boolean closure*: given automata A<sup>1</sup> and A2, both with the same set of variables **x**, do there exist automata A∪, A<sup>∩</sup> and A<sup>1</sup> such that L(A∪) = A1∪A2, L(A∩) = A<sup>1</sup> ∩ A<sup>2</sup> and L(A1) = Σ[**x**] <sup>∗</sup> \ L(A1) ? 2. *emptiness*: given an automaton A, is L(A) = ∅ ?

It is well known that other problems, such as *universality* (given automaton A with variables **x**, does L(A) = Σ[**x**] <sup>∗</sup>?) and *inclusion* (given automata A<sup>1</sup> and A<sup>2</sup> with the same set of variables, does L(A1) ⊆ L(A2)?) can be reduced to the above problems. Observe furthermore that we do not consider cases in which the sets of variables in the two automata differ. An interesting problem in this case would be: given automata A<sup>1</sup> and A2, with variables **x**<sup>1</sup> and **x**2, respectively, such that **x**<sup>1</sup> ⊆ **x**2, does L(A1) ⊆ L(A2)↓**<sup>x</sup>**<sup>1</sup> , where L(A2)↓**<sup>x</sup>**<sup>1</sup> is the projection of the set of words L(A2) onto the variables **x**1? This problem is considered as future work.

#### **3.1 Boolean Closure**

Given a set Q of Boolean variables and a set **x** of variables of sort Data, for a formula <sup>φ</sup> <sup>∈</sup> Form<sup>+</sup>(Q, **<sup>x</sup>**), with no negated occurrences of the Boolean variables, we define the formula <sup>φ</sup> <sup>∈</sup> Form<sup>+</sup>(Q, **<sup>x</sup>**) recursively on the structure of <sup>φ</sup>:

$$\begin{array}{ll} \overline{\phi\_1 \lor \phi\_2} \equiv \overline{\phi\_1} \land \overline{\phi\_2} & \overline{\phi\_1 \land \phi\_2} \equiv \overline{\phi\_1} \lor \overline{\phi\_2} \\ \overline{\neg \phi} \equiv \neg \overline{\phi} \text{ if } \phi \text{ not atom} & \overline{\phi} \equiv \phi \text{ if } \phi \in Q \\ \overline{\phi} \equiv \neg \phi \text{ if } \phi \notin Q \text{ atom} \end{array}$$

We have <sup>|</sup>φ<sup>|</sup> <sup>=</sup> <sup>|</sup>φ|, for every formula <sup>φ</sup> <sup>∈</sup> Form<sup>+</sup>(Q, **<sup>x</sup>**).

In the following let <sup>A</sup>i <sup>=</sup> **x**, Qi, ιi, Fi, Δi, for <sup>i</sup> = 1, 2, where w.l.o.g. we assume that Q<sup>1</sup> ∩ Q<sup>2</sup> = ∅. We define:

$$\begin{array}{l} \mathcal{R}\_{\cup} = \langle \mathbf{x}, Q\_1 \cup Q\_2, \iota\_1 \lor \iota\_2, F\_1 \cup F\_2, \Delta\_1 \cup \Delta\_2 \rangle\\ \mathcal{R}\_{\cap} = \langle \mathbf{x}, Q\_1 \cup Q\_2, \iota\_1 \land \iota\_2, F\_1 \cup F\_2, \Delta\_1 \cup \Delta\_2 \rangle\\ \overline{\mathcal{R}\_{1}} = \langle \mathbf{x}, Q\_1, \overline{\iota\_1}, Q\_1 \mid F\_1, \overline{\Delta\_1} \rangle \end{array}$$

where Δ1(q, a) ≡ Δ1(q, a), for all q ∈ Q<sup>1</sup> and a ∈ Σ. The following lemma shows the correctness of the above definitions:

**Lemma 1.** *Given automata* <sup>A</sup>i <sup>=</sup> **x**, Qi, ιi, Fi, Δi*, for* <sup>i</sup> = 1, <sup>2</sup>*, such that* <sup>Q</sup><sup>1</sup> <sup>∩</sup> Q<sup>2</sup> = ∅*, we have* L(A∪) = L(A1) ∪ L(A2)*,* L(A∩) = L(A1) ∩ L(A2) *and* L(A1) = Σ[**x**] <sup>∗</sup> \ L(A1)*.*

It is easy to see that |A∪| = |A∩| = |A1| + |A2| and |A| = |A|, thus the automata for the Boolean operations, including complementation, can be built in linear time. This matches the linear-time bounds for intersection and complementation of alternating automata over finite alphabets [3].

#### **4 Antichains and Interpolants for Emptiness**

The emptiness problem for ADA is undecidable, even in very simple cases. For instance, if Data<sup>I</sup> is the set of positive integers, an ADA can simulate an Alternating Vector Addition System with States (AVASS) using only atoms x ≥ k and <sup>x</sup> <sup>=</sup> <sup>x</sup>+k, for <sup>k</sup> <sup>∈</sup> <sup>Z</sup>, with the classical interpretation of the function symbols on integers. Since reachability of a control state is undecidable for AVASS [15], ADA emptiness is undecidable.

Consequently, we give up on the guarantee for termination and build semialgorithms that meet the requirements below:


Let us fix an automaton A = **x**, Q, ι, F, Δ whose (finite) input event alphabet is <sup>Σ</sup>, for the rest of this section. Given a formula <sup>φ</sup> <sup>∈</sup> Form<sup>+</sup>(Q, **<sup>x</sup>**) and an input event a ∈ Σ, we define the *post-image* function PostA(φ, a) ≡ <sup>∃</sup>**x**.Δ(φ[**x**/**x**], a) <sup>∈</sup> Form<sup>+</sup>(Q, **<sup>x</sup>**), mapping each formula in Form<sup>+</sup>(Q, **<sup>x</sup>**) to a formula defining the effect of reading the event a. We generalize the post-image function to finite sequences of input events, as follows:

$$\begin{array}{lll}\mathsf{Post}\_{\mathcal{R}}(\phi,\varepsilon) \equiv \phi & \mathsf{Post}\_{\mathcal{R}}(\phi,ua) \equiv \mathsf{Post}\_{\mathcal{R}}(\mathsf{Post}\_{\mathcal{R}}(\phi,u),a) \\ \mathsf{Acc}\_{\mathcal{R}}(u) \equiv \mathsf{Post}\_{\mathcal{R}}(\iota,u) \land \bigwedge\_{q \in Q} (q \to \bot), \text{ for any } u \in \Sigma^\* \end{array}$$

Then the emptiness problem for A becomes: does there exist u ∈ Σ<sup>∗</sup> such that the formula AccA(u) is satisfiable? Observe that, since we ask a satisfiability query, the final states of <sup>A</sup> need not be constrained<sup>2</sup>. A na¨ıve semi-algorithm enumerates all finite sequences and checks the satisfiability of AccA(u) for each <sup>u</sup> <sup>∈</sup> <sup>Σ</sup>∗, using a decision procedure for the theory <sup>T</sup>(S, <sup>I</sup>).

Since no Boolean variable from Q occurs under negation in φ, it is easy to prove the following monotonicity property: given two formulae φ, ψ ∈ Form<sup>+</sup>(Q, **<sup>x</sup>**) if <sup>φ</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup> then PostA(φ, u) <sup>|</sup><sup>=</sup> PostA(ψ, u), for any <sup>u</sup> <sup>∈</sup> <sup>Σ</sup>∗. This suggest an improvement of the above semi-algorithm, that enumerates and stores only a set <sup>U</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> for which {PostA(φ, u) <sup>|</sup> <sup>u</sup> <sup>∈</sup> <sup>U</sup>} forms an *antichain*<sup>3</sup> w.r.t. the entailment partial order. This is because, for any u, v ∈ Σ∗, if PostA(ι, u) |= PostA(ι, v) and AccA(uw) is satisfiable for some w ∈ Σ∗, then PostA(ι, uw) |= PostA(ι, vw), thus AccA(vw) is satisfiable as well, and there is no need for u, since the non-emptiness of A can be proved using v alone. However, even with this optimization, the enumeration of sequences from Σ<sup>∗</sup> diverges in many real cases, because infinite antichains exist in many interpretations, e.g. <sup>q</sup> <sup>∧</sup> <sup>x</sup> <sup>≈</sup> <sup>0</sup>, q <sup>∧</sup> <sup>x</sup> <sup>≈</sup> <sup>1</sup>,... for Data<sup>I</sup> <sup>=</sup> <sup>N</sup>.

<sup>A</sup> *safety invariant* for <sup>A</sup> is a function <sup>I</sup> : (<sup>Q</sup> <sup>→</sup> <sup>B</sup>) <sup>→</sup> <sup>2</sup>**<sup>x</sup>**→Data<sup>I</sup> such that, for every Boolean valuation <sup>β</sup> : <sup>Q</sup> <sup>→</sup> <sup>B</sup>, every valuation <sup>ν</sup> : **<sup>x</sup>** <sup>→</sup> Data<sup>I</sup> of the data variables and every finite sequence u ∈ Σ<sup>∗</sup> of input events, the following hold:

1. I, β ∪ ν |= PostA(ι, u) ⇒ ν ∈ I(β), and 2. ν ∈ I(β) ⇒ I, β ∪ ν |= AccA(u).

If I satisfies only the first point above, we call it an *invariant*. Intuitively, a safety invariant maps every Boolean valuation into a set of data valuations, that contains the initial configuration ι ≡ PostA(ι, ε), whose data variables are unconstrained, over-approximates the set of reachable valuations (point 1) and excludes the valuations satisfying the acceptance condition (point 2). A formula <sup>φ</sup>(Q, **<sup>x</sup>**) is said to *define* <sup>I</sup> iff for all <sup>β</sup> : <sup>Q</sup> <sup>→</sup> <sup>B</sup> and <sup>ν</sup> : **<sup>x</sup>** <sup>→</sup> DataI, we have I, β ∪ ν |= φ iff ν ∈ I(β).

**Lemma 2.** *For any automaton* A*, we have* L(A) = ∅ *if and only if* A *has a safety invariant.*

Turning back to the issue of divergence of language emptiness semialgorithms in the case L(A) = ∅, we can observe that an enumeration of input sequences u1, u2,... ∈ Σ<sup>∗</sup> can stop at step k as soon as <sup>k</sup> i=1 PostA(ι, ui) defines a safety invariant for A. Although this condition can be effectively checked using a decision procedure for the theory <sup>T</sup>(S, <sup>I</sup>), there is no guarantee that this check will ever succeed.

The solution we adopt in the sequel is abstraction to ensure the termination of invariant computations. However, it is worth pointing out from the start that abstraction alone will only allow us to build invariants that are not necessarily

<sup>2</sup> Since each state occurs positively in AccA(*u*), this formula has a model iff it has a

model with every *<sup>q</sup>* <sup>∈</sup> *<sup>F</sup>* set to true. <sup>3</sup> Given a partial order (*D,* ) an antichain is a set *<sup>A</sup>* <sup>⊆</sup> *<sup>D</sup>* such that *<sup>a</sup> <sup>b</sup>* for any *a, b* ∈ *A*.

safety invariants. To meet the latter condition, we resort to counterexample guided abstraction refinement (CEGAR).

Formally, we fix a set of formulae Π ⊆ Form(Q, **x**), such that ⊥ ∈ Π and refer to these formulae as *predicates*. Given a formula <sup>φ</sup>, we denote by <sup>φ</sup> <sup>≡</sup> {<sup>π</sup> <sup>∈</sup> <sup>Π</sup> <sup>|</sup> <sup>φ</sup> <sup>|</sup><sup>=</sup> <sup>π</sup>} the abstraction of <sup>φ</sup> w.r.t. the predicates in <sup>Π</sup>. The abstract versions of the post-image and acceptance condition are defined as follows:

$$\begin{array}{l} \mathsf{Post}\_{\mathcal{R}}^{\sharp}(\phi,\varepsilon) \equiv \phi \,\mathsf{Post}\_{\mathcal{R}}^{\sharp}(\phi,ua) \equiv \left(\mathsf{Post}\_{\mathcal{R}}(\mathsf{Post}\_{\mathcal{R}}^{\sharp}(\phi,u),a)\right)^{\sharp} \\ \mathsf{Acc}\_{\mathcal{R}}^{\sharp}(u) \equiv \mathsf{Post}\_{\mathcal{R}}^{\sharp}(\iota,u) \wedge \bigwedge\_{q \in Q\backslash F} (q \to \bot), \text{ for any } u \in \Sigma^{\*} \end{array}$$

**Lemma 3.** *For any bijection* <sup>μ</sup> : <sup>N</sup> <sup>→</sup> <sup>Σ</sup>∗*, there exists* k > <sup>0</sup> *such that* k i=0 Post <sup>A</sup>(ι, μ(i)) *defines an invariant* <sup>I</sup> *for* A*.*

We are left with fulfilling point (2) from the definition of a safety invariant. To this end, suppose that, for a given set Π of predicates, the invariant I , defined by the previous lemma, meets point (1) but not point (2), where Post<sup>A</sup> and Acc<sup>A</sup> replace Post <sup>A</sup> and Acc <sup>A</sup>, respectively. In other words, there exists a finite sequence u ∈ Σ<sup>∗</sup> such that ν ∈ I (β) and I, β ∪ ν |= Acc <sup>A</sup>(u), for some Boolean <sup>β</sup> : <sup>Q</sup> <sup>→</sup> <sup>B</sup> and data <sup>ν</sup> : **<sup>x</sup>** <sup>→</sup> Data<sup>I</sup> valuations. Such a <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is called a *counterexample*.

Once a counterexample u is discovered, there are two possibilities. Either (i) AccA(u) is satisfiable, in which case u is *feasible* and L(A) = ∅, or (ii) AccA(u) is unsatisfiable, in which case u is *spurious*. In the first case, our semi-algorithm stops and returns a witness for non-emptiness, obtained from the satisfying valuation of AccA(u) and in the second case, we must strenghten the invariant by excluding from I all pairs (β,ν) such that I, β ∪ ν |= Acc A(u). This strengthening is carried out by adding to Π several predicates that are sufficient to exclude the spurious counterexample.

Given an unsatisfiable conjunction of formulae <sup>ψ</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>ψ</sup>n, an *interpolant* is a tuple of formulae I1,...,In−<sup>1</sup>, In such that <sup>I</sup>n ≡ ⊥, <sup>I</sup>i <sup>∧</sup> <sup>ψ</sup>i <sup>|</sup>=<sup>T</sup> <sup>I</sup>i+1 and <sup>I</sup>i contains only variables and function symbols that are common to <sup>ψ</sup>i and <sup>ψ</sup>i+1, for all <sup>i</sup> <sup>∈</sup> [<sup>n</sup> <sup>−</sup> 1]. Moreover, by Lyndon's Interpolation Theorem [16], we can assume without loss of generality that every Boolean variable with at least one positive (negative) occurrence in <sup>I</sup>i has at least one positive (negative) occurrence in both <sup>ψ</sup>i and <sup>ψ</sup>i+1. In the following, we shall assume the existence of an interpolating decision procedure for <sup>T</sup>(S, <sup>I</sup>) that meets the requirements of Lyndon's Interpolation Theorem.

A classical method for abstraction refinement is to add the elements of the interpolant obtained from a proof of spuriousness to the set of predicates. This guarantees progress, meaning that the particular spurious counterexample, from which the interpolant was generated, will never be revisited in the future. Though not always, in many practical test cases this progress property eventually yields a safety invariant.

Given a non-empty spurious counterexample <sup>u</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...an, where n > 0, we consider the following interpolation problem:

$$\Theta(u) \equiv \theta\_0(Q\_0) \land \theta\_1(Q\_0 \cup Q\_1, \mathbf{x}\_0 \cup \mathbf{x}\_1) \land \dots \tag{1}$$

$$\land \theta\_n(Q\_{n-1} \cup Q\_n, \mathbf{x}\_{n-1} \cup \mathbf{x}\_n) \land \theta\_{n+1}(Q\_n)$$

where <sup>Q</sup>k <sup>=</sup> {qk <sup>|</sup> <sup>q</sup> <sup>∈</sup> <sup>Q</sup>}, <sup>k</sup> <sup>∈</sup> [0, n] are time-stamped sets of Boolean variables corresponding to the set Q of states of A. The first conjunct θ0(Q0) ≡ ι[Q0/Q] is the initial configuration of <sup>A</sup>, with every <sup>q</sup> <sup>∈</sup> FVBool(ι) replaced by <sup>q</sup>0. The definition of <sup>θ</sup>k, for all <sup>k</sup> <sup>∈</sup> [1, n], uses *replacement sets* <sup>R</sup> <sup>⊆</sup> <sup>Q</sup>, <sup>∈</sup> [0, n], which are defined inductively below:

$$\begin{array}{rcl} -R\_{0} = \mathrm{FV}^{\mathsf{Bool}}(\theta\_{0}),\\ -\theta\_{\ell} \equiv \bigwedge\_{q\_{\ell-1} \in R\_{\ell-1}} (q\_{\ell-1} \rightarrow \Delta(q, a\_{\ell})[Q\_{\ell}/Q, \mathbf{x}\_{\ell-1}/\overline{\mathbf{x}}, \mathbf{x}\_{\ell}/\mathbf{x}]) \quad \text{and} \quad R\_{\ell} = \mathrm{FV}^{\mathsf{Bool}}(\theta\_{\ell}) \cap Q\_{\ell}, \text{ for each } \ell \in [1, n].\\ \mathrm{FV}^{\mathsf{Bool}}(\theta\_{\ell}) \cap Q\_{\ell}, \text{ for each } \ell \in [1, n].\\ -\theta\_{n+1}(Q\_{n}) \equiv \bigwedge\_{q \in Q\backslash F} (q\_{n} \rightarrow \bot). \end{array}$$

The intuition is that <sup>R</sup>0,...,Rn are the sets of states replaced, <sup>θ</sup>0,...,θn are the sets of transition rules fired on the run of <sup>A</sup> over <sup>u</sup> and <sup>θ</sup>n+1 is the acceptance condition, which forces the last remaining non-final states to be false. We recall that a run of A over u is a sequence:

$$
\phi\_0(Q) \Rightarrow \phi\_1(Q, \mathbf{x}\_0 \cup \mathbf{x}\_1) \Rightarrow \dots \Rightarrow \phi\_n(Q, \mathbf{x}\_0 \cup \dots \cup \mathbf{x}\_n)
$$

where <sup>φ</sup><sup>0</sup> is the initial configuration <sup>ι</sup> and for each k > 0, <sup>φ</sup>k is obtained from <sup>φ</sup>k−<sup>1</sup> by replacing each state <sup>q</sup> <sup>∈</sup> FVBool(φk−<sup>1</sup>) by the formula <sup>Δ</sup>(q, ak)[**x**k−1/**x**, **<sup>x</sup>**k/**x**], given by the transition function of A. Observe that, because the states are replaced with transition formulae when moving one step in a run, these formulae lose track of the control history and are not suitable for producing interpolants that relate states and data.

The main idea behind the above definition of the interpolation problem is that we would like to obtain an interpolant , I0(Q), I1(Q, **<sup>x</sup>**),...,In(Q, **<sup>x</sup>**), ⊥ whose formulae *combine states with the data constraints that must hold locally*, whenever the control reaches a certain Boolean configuration. This association of states with data valuations is tantamount to defining efficient semi-algorithms, based on lazy abstraction [8]. Furthermore, the abstraction defined by the interpolants generated in this way can also *over-approximate the control structure* of an automaton, in addition to the sets of data values encountered throughout its runs.

The correctness of this interpolation-based abstraction refinement setup is captured by the progress property below, which guarantees that adding the formulae of an interpolant for Θ(u) to the set Π of predicates suffices to exclude the spurious counterexample u from future searches.

**Lemma 4.** *For any sequence* <sup>u</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...an <sup>∈</sup> <sup>Σ</sup>∗*, if* AccA(u) *is unsatisfiable, the following hold:*


### **5 Lazy Predicate Abstraction for ADA Emptiness**

We have now all the ingredients to describe the first emptiness checking semialgorithm for alternating data automata. Algorithm<sup>4</sup> 1 builds an *abstract reachability tree* (ART) whose nodes are labeled with formulae over-approximating the concrete sets of configurations, and a covering relation between nodes in order to ensure that the set of formulae labeling the nodes in the ART forms an antichain. Any spurious counterexample is eliminated by computing an interpolant and adding its formulae to the set of predicates (cf. Lemma 4). Formally, an ART is tuple <sup>T</sup> <sup>=</sup> N,E, <sup>r</sup>, Λ, R, T, -, where:


Each node n ∈ N corresponds to a unique path from the root to n, labeled by a sequence λ(n) ∈ Σ<sup>∗</sup> of input events. The *least infeasible suffix* of λ(n) is the smallest sequence <sup>v</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...ak, such that <sup>λ</sup>(n) = wv, for some <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> and the following formula is unsatisfiable:

$$\Psi(v) \equiv A(p)[Q\_0/Q] \land \theta\_1(Q\_0 \cup Q\_1, \mathbf{x}\_0 \cup \mathbf{x}\_1) \land \dots \land \theta\_{k+1}(Q\_k) \tag{2}$$

where <sup>θ</sup>1,...,θk+1 are defined as in (1) and <sup>θ</sup><sup>0</sup> <sup>≡</sup> <sup>Λ</sup>(p)[Q0/Q]. The *pivot* of n is the node p corresponding to the start of the least infeasible suffix. We assume the existence of two functions FindPivot(u,T) and LeastInfeasibleSuffix(u,T) that return the pivot and least infeasible suffix of a sequence u ∈ Σ<sup>∗</sup> in an ART T, without detailing their implementation.

With these considerations, Algorithm 1 uses a worklist iteration to build an ART. We keep newly expanded nodes of <sup>T</sup> in a queue WorkList, thus implementing a breadth-first exploration strategy, which guarantees that the shortest counterexamples are explored first. When the search encounters a counterexample candidate u, it is checked for spuriousness. If the counterexample is feasible, the procedure returns a data word w ∈ L(A), which interleaves the input events of u with the data valuations from the model of AccA(u) (since u is feasible, clearly AccA(u) is satisfiable). Otherwise, u is spurious and we compute its pivot p (line 12), add the interpolants for the least unfeasible suffix of u to Π, remove and recompute the subtree of T rooted at p.

Termination of Algorithm 1 depends on the ability of a given interpolating decision procedure for the combined Boolean and data theory <sup>T</sup>(S, <sup>I</sup>) to provide

<sup>4</sup> Though termination is not guaranteed, we call it algorithm for conciseness.

#### **Algorithm 1.** Lazy Predicate Abstraction for ADA Emptiness

**input**: an ADA <sup>A</sup> <sup>=</sup> **<sup>x</sup>**, Q, ι, F, Δ over the alphabet Σ of input events **output**: true if L(A) = <sup>∅</sup> and a data word w <sup>∈</sup> L(A) otherwise 1: let <sup>T</sup> <sup>=</sup> N, E, <sup>r</sup>, Λ, be an ART 2: initially N <sup>=</sup> E <sup>=</sup> - <sup>=</sup> <sup>∅</sup>, Λ <sup>=</sup> {(r, ι)}, <sup>Π</sup> <sup>=</sup> {⊥}, WorkList <sup>=</sup> r, 3: **while** WorkList <sup>=</sup> <sup>∅</sup> **do** 4: dequeue n from WorkList 5: N <sup>←</sup> N ∪ {n} 6: let <sup>λ</sup>(n) = <sup>a</sup><sup>1</sup> ...a<sup>k</sup> be the label of the path from <sup>r</sup> to <sup>n</sup> 7: **if** Post A(λ(n)) is satisfiable **then** counterexample candidate 8: **if** AccA(u) is satisfiable **then** feasible counterexample 9: get model (β, ν1,...,νk) of AccA(λ(n)) 10: **return** w = (a1, ν1) ... (ak, νk) w <sup>∈</sup> L(A) by construction 11: **else** spurious counterexample 12: p <sup>←</sup> FindPivot(λ(n), <sup>T</sup>) 13: v <sup>←</sup> LeastInfeasibleSuffix(λ(n), <sup>T</sup>) 14: <sup>Π</sup> <sup>←</sup> <sup>Π</sup> ∪ {I0,...,I-}, where , I0,...,I-, <sup>⊥</sup> is an interpolant for Ψ(v) 15: let <sup>S</sup> <sup>=</sup> - N- , E- , p, Λ- , - be the subtree of <sup>T</sup> rooted at p 16: **for** (m, q) <sup>∈</sup> such that q <sup>∈</sup> N **do** 17: remove m from N and enqueue m into WorkList 18: remove <sup>S</sup> from <sup>T</sup> 19: enqueue p into WorkList recompute the subtree rooted at p 20: **else** 21: **for** a <sup>∈</sup> Σ **do** expand n 22: φ <sup>←</sup> Post A(Λ(n), a) 23: **if** exist m <sup>∈</sup> N such that φ <sup>|</sup><sup>=</sup> Λ(m) **then** 24: - ← - ∪ {(n, m)} m covers n 25: **else** 26: let s be a fresh node 27: E <sup>←</sup> E ∪ {(n, a, s)} 28: Λ <sup>←</sup> Λ ∪ {(s, φ)} 29: R ← {m <sup>∈</sup> WorkList <sup>|</sup> Λ(m) <sup>|</sup><sup>=</sup> φ} worklist nodes covered by s 30: **for** r <sup>∈</sup> R **do** 31: **for** m <sup>∈</sup> N such that (m, b, r) <sup>∈</sup> E, b <sup>∈</sup> Σ **do** 32: - ← - ∪ {(m, s)} redirect covered children from R into s 33: **for** (m, r) <sup>∈</sup> **do** 34: - ← - ∪ {(m, s)} redirect covered nodes from R into s 35: remove R from <sup>T</sup> 36: enqueue s into WorkList 37: **return** true

interpolants that yield a safety invariant, whenever L(A) = ∅. In this case, we use the covering relation to ensure that, when a newly generated node is covered by a node already in N, it is not added to the worklist, thus cutting the current branch of the search.

Formally, for any two nodes n, m ∈ N, we have n m iff Post <sup>A</sup>(Λ(n), a) <sup>|</sup><sup>=</sup> Λ(m) for some a ∈ Σ, in other words, if n has a successor whose label entails the label of m.

*Example.* Consider the automaton given in Fig. 1. First, Algorithm 1 fires the sequence a, and since there are no other formulae than ⊥ in Π, the successor of ι ≡ q<sup>0</sup> is , in Fig. 2(a). The spuriousness check for a yields the root of the ART as pivot and the interpolant q0, q1, which is added to the set Π. Then the node is removed and the next time a is fired, it creates a node labeled q1. The second sequence aa creates a successor node q1, which is covered by the first, depicted with a dashed arrow, in Fig. 2(b). The third sequence is ab, which results in a new uncovered node and triggers a spuriousness check. The new predicate obtained from this check is x ≤ 0 ∧ q<sup>2</sup> ∧ y ≥ 0 and the pivot is again the root. Then the entire ART is rebuilt with the new predicates and the fourth sequence aab yields an uncovered node , in Fig. 2(c). The new pivot is the endpoint of a and the newly added predicates are q<sup>1</sup> ∧ q<sup>2</sup> and y>x − 1 ∧ q2. Finally, the ART is rebuilt from the pivot node and finally all nodes are covered, thus proving the emptiness of the automaton, in Fig. 2(d).

The correctness of Algorithm 1 is proved below:

**Fig. 2.** Proving emptiness of the automaton from Fig. <sup>1</sup> by Algorithm <sup>1</sup>

**Theorem 1.** *Given an automaton* A*, such that* L(A) = ∅*, Algorithm 1 terminates and returns a word* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)*. If Algorithm <sup>1</sup> terminates reporting* true*, then* L(A) = ∅*.*

### **6 Checking ADA Emptiness with** Impact

As pointed out by a number of authors, the bottleneck of predicate abstraction is the high cost of reconstructing parts of the ART, subsequent to the refinement of the set of predicates. The main idea of the Impact procedure [17] is that this can be avoided and the refinement (strengthening of the node labels of the ART) can be performed in-place. This refinement step requires an update of the covering relation, because a node that used to cover another node might not cover it after the strengthening of its label.

We consider a total alphabetical order ≺ on Σ and lift it to the total lexicographical order ≺<sup>∗</sup> on Σ∗. A node n ∈ N is *covered* if (n, p) ∈ or it has an ancestor m such that (m, p) ∈ -, for some p ∈ N. A node n is *closed* if it is covered, or Λ(n) |= Λ(m) for all m ∈ N such that λ(m) ≺<sup>∗</sup> λ(n). Observe that we use the coverage relation here with a different meaning than in Algorithm 1.

The execution of Algorithm 2 consists of three phases<sup>5</sup>: *close*, *refine* and *expand*. Let n be a node removed from the worklist at line 4. If AccA(λ(n))

<sup>5</sup> Corresponding to the Close, Refine and Expand in [17].

#### **Algorithm 2.** Impact for ADA Emptiness

```
input: an ADA A = 
                 x, Q, ι, F, Δ over the alphabet Σ of input events
 output: true if L(A) = ∅ and a data word w ∈ L(A) otherwise
1: let T = 
         N, E, r, Λ, R, T , -
                       be an ART
2: initially N = E = T = -
                     = ∅, Λ = {(r, ι)}, R = FVBool(ι[Q0/Q]), WorkList = {r}
3: while WorkList = ∅ do
4: dequeue n from WorkList
5: N ← N ∪ {n}
6: let (r, a1, n1), (n1, a2, n2),..., (nk−1, ak, n) be the path from r to n
7: if AccA(a1 ...ak) is satisfiable then  counterexample is feasible
8: get model (β, ν1,...,νk) of AccA(λ(n))
9: return w = (a1, ν1) ... (ak, νk)  w ∈ L(A) by construction
10: else  spurious counterexample
11: let 
           , I0,...,Ik, ⊥ be an interpolant for Θ(a1 ...ak)
12: b ← false
13: for i = 0,...,k do
14: if Λ(ni) |= Ii then
15: -
              ← -
                  \ {(m, ni) ∈ -
                             | m ∈ N}
16: Λ(ni) ← Λ(ni) ∧ Ii  strenghten the label of ni
17: if ¬b then
18: b ← Close(ni)
19: if n is not covered then
20: for a ∈ Σ do  expand n
21: let s be a fresh node and e = (n, a, s) be a new edge
22: E ← E ∪ {e}
23: Λ ← Λ ∪ {(s, )}
24: T ← T ∪ {(e, θk)}
25: R ← R ∪ {(s, 
                      q∈R(n) FVBool(Δ(q, a)))}
26: enqueue s into WorkList
27: return true
1: function Close(x) returns Bool
2: for y ∈ N such that λ(y) ≺∗ λ(x) do
3: if Λ(x) |= Λ(y) then
4: -
           ← (-
                \ {(p, q) ∈ -
                         | q is x or a successor of x}) ∪ {(x, y)}
5: return true
6: return false
```
is satisfiable, the counterexample λ(n) is feasible, in which case a model of AccA(λ(n)) is obtained and a word w ∈ L(A) is returned. Otherwise, λ(n) is a spurious counterexample and the procedure enters the refinement phase (lines 11–18). The interpolant for Θ(λ(n)) (cf. formula 1) is used to strenghten the labels of all the ancestors of n, by conjoining the formulae of the interpolant to the existing labels.

In this process, the nodes on the path between r and n, including n, might become eligible for coverage, therefore we attempt to close each ancestor of n that is impacted by the refinement (line 18). Observe that, in this case the call to Close must uncover each node which is covered by a successor of n (line 4 of the Close function). This is required because, due to the over-approximation of the sets of reachable configurations, the covering relation is not transitive, as explained in [17]. If Close adds a covering edge (ni, m) to -, it does not have to be called for the successors of <sup>n</sup>i on this path, which is handled via the Boolean flag b.

Finally, if n is still uncovered (it has not been previously covered during the refinement phase) we expand n (lines 20–26) by creating a new node for each successor s via the input event a ∈ Σ and inserting it into the worklist.

**Fig. 3.** Proving emptiness of the automaton from Fig. <sup>1</sup> by Algorithm <sup>2</sup>

*Example.* We show the execution of Algorithm 2 on the automaton from Fig. 1. Initially, the procedure fires the sequence a, whose endpoint is labeled with , in Fig. 3(a). Since this node is uncovered, we check the spuriousness of the counterexample a and refine the label of the node to q1. Since the node is still uncovered, two successors, labeled with are computed, corresponding to the sequences aa and ab, in Fig. 3(b). The spuriousness check for aa yields the interpolant q0, x ≤ 0 ∧ q<sup>2</sup> ∧ y ≥ 0 which strengthens the label of the endpoint of a from q<sup>1</sup> to q<sup>1</sup> ∧ x ≤ 0 ∧ q<sup>2</sup> ∧ y ≥ 0. The sequence ab is also found to be spurious, which changes the label of its endpoint from to ⊥, and also covers it (depicted with a dashed edge). Since the endpoint of aa is not covered, it is expanded to aaa and aab, in Fig. 3(c). Both sequences aaa and aab are found to be spurious, and the enpoint of aab, whose label has changed from to ⊥, is now covered. In the process, the label of aa has also changed from q<sup>1</sup> to q<sup>1</sup> ∧ y>x − 1 ∧ q2, due to the sstrengthening with the interpolant from aab. Finally, the only uncovered node aaa is expanded to aaaa and aaab, both found to be spurious, in Fig. 3(d). The refinement of aaab causes the label of aaa to change from q<sup>1</sup> to q<sup>1</sup> ∧ y>x − 1 ∧ q<sup>2</sup> and this node is now covered by aa. Since its successors are also covered, there are no uncovered nodes and the procedure returns true. The correctness of Algorithm 2 is coined by the theorem below:

**Theorem 2.** *Given an automaton* A*, such that* L(A) = ∅*, Algorithm 2 terminates and returns a word* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)*. If Algorithm <sup>2</sup> terminates reporting* true*, then* L(A) = ∅*.*

### **7 Experimental Evaluation**

We have implemented both Algorithms 1 and 2 in a prototype tool<sup>6</sup> that uses the MathSAT5 SMT solver<sup>7</sup> via the Java SMT interface<sup>8</sup> for the satisfiability queries and interpolant generation, in the theory of linear integer arithmetic with uninterpreted Boolean functions (UFLIA). We compared both algorithms with a previous implementation of a trace inclusion procedure, called Includer<sup>9</sup>, that uses on-the-fly determinisation and lazy predicate abstraction with interpolant-based refinement [11] in the LIA theory. The datasets generated during and/or analysed during the current study are available in the figshare repository: https:// doi.org/10.6084/m9.figshare.5925472.v1 [12].


**Table 1.**

The results of the experiments are given in Table 1. We applied the tool first to several array logic entailments, which occur as verification conditions for imperative programs with arrays [2] (array shift, array simple, array rotation1+2)

<sup>6</sup> The implementation is available at https://github.com/cathiec/JAltImpact.

<sup>7</sup> http://mathsat.fbk.eu/.

<sup>8</sup> https://github.com/sosy-lab/java-smt.

<sup>9</sup> http://www.fit.vutbr.cz/research/groups/verifit/tools/includer/.

available online [19]. Next, we applied it on proving safety properties of hardware circuits (hw1+2) [22]. Finally, we considered two timed communication protocols, consisting of systems that are asynchronous compositions of timed automata, whom correctness specifications are given by timed automata monitors: a timed version of the Alternating Bit Protocol (abp) [25] and a controller of a railroad crossing (train) [9]. All results were obtained on x86 64 Linux Ubuntu virtual machine with 8 GB of RAM running on an Intel(R) Xeon(R) CPU E5- 2683 v3 @ 2.00 GHz. The automata sizes are given in bytes needed to store their ASCII description on file and the execution times are in seconds.

As in the case of non-alternating nondeterministic integer programs [17], the alternating version of Impact (Algorithm 2) outperforms lazy predicate abstraction for checking emptiness by at least one order of magnitude. Moreover, Impact is comparable, on average, to the previous implementation of Includer, which uses also MathSAT5 via the C API. We believe the reason for which Includer outperforms Impact on some examples is the hardness of the UFLIA entailment checks used in Algorithm 2 (lines 14 and 3 in the function Close) as opposed to the pure LIA entailment checks used in Includer. According to our statistics, Algorithm 2 spends more than 50% of the time waiting for the SMT solver to finish answering entailment queries.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Revisiting Enumerative Instantiation**

Andrew Reynolds1(B), Haniel Barbosa1,2(B) , and Pascal Fontaine2(B)

> <sup>1</sup> University of Iowa, Iowa City, USA andrew.j.reynolds@gmail.com <sup>2</sup> Universit´e de Lorraine, CNRS, Inria, LORIA, Nancy, France {haniel.barbosa,pascal.fontaine}@inria.fr

**Abstract.** Formal methods applications often rely on SMT solvers to automatically discharge proof obligations. SMT solvers handle quantified formulas using incomplete heuristic techniques like E-matching, and often resort to model-based quantifier instantiation (MBQI) when these techniques fail. This paper revisits enumerative instantiation, a technique that considers instantiations based on exhaustive enumeration of ground terms. Although simple, we argue that enumerative instantiation can supplement other instantiation techniques and be a viable alternative to MBQI for valid proof obligations. We first present a stronger Herbrand Theorem, better suited as a basis for the instantiation loop used in SMT solvers; it furthermore requires considering less instances than classical Herbrand instantiation. Based on this result, we present different strategies for combining enumerative instantiation with other instantiation techniques in an effective way. The experimental evaluation shows that the implementation of these new techniques in the SMT solver CVC4 leads to significant improvements in several benchmark libraries, including many stemming from verification efforts.

### **1 Introduction**

In many formal methods applications, such as verification, it is common to represent proof obligations in terms of the *Satisfiability Modulo Theories* (SMT) problem. SMT solvers have thus become popular backends for such applications. They have been primarily designed to be decision procedures for quantifier-free problems, on which they are highly efficient and capable of handling large formulas over background theories. Quantified formulas are generally handled with instantiation techniques that are often incomplete, even on decidable or semidecidable fragments. Heavily relying on incomplete heuristics however leads to instability and unpredictability on the solver's behavior, which is undesirable for the tools relying on them. To address these issues some systems use model-based instantiation (MBQI) [19], a complete technique for first-order logic with equality and for several restricted fragments containing theories, which can be used as a fallback strategy to the incomplete techniques.

In this paper we introduce a novel enumerative instantiation technique which can serve as a simpler alternative to model-based instantiation. Similar to MBQI, our technique can be used as a secondary strategy when incomplete techniques fail. Our experiments show that a careful implementation of this technique in the state-of-the-art SMT solver CVC4 leads to noticeable gains in performance on unsatisfiable problems.

*Background.* Some of the earliest tools for theorem proving in first-order logic come from the work by Skolem and Herbrand. The Herbrand Theorem states that if a closed formula in Skolem normal form, i.e. a prenex formula without existential quantifiers, is unsatisfiable, then there is an unsatisfiable finite conjunction of Herbrand instances of the formula, that is, instances on terms from the *Herbrand universe*, i.e. the set of all possible well-sorted ground terms in the formula's signature. The first theorem provers for first-order logic to be implemented based on Herbrand's theorem employed a completely unguided search on the Herbrand Universe (e.g. Gilmore [20] and Davis et al. [11] early efforts). Such systems were only capable of dealing with very simple formulas and were soon put aside. Techniques which would only generate Herbrand instances when needed were first introduced by Prawitz [24] and later refined by Davis and Putnam [12], culminating in the resolution calculus introduced by Robinson [30]. The most successful techniques for handling pure first-order logic have been based on resolution and ordering criteria [3]. More recently, techniques based on instantiation have shown promise for first-order logic as well [13,17,28]. Inspired by early work on the subject, this paper revisits whether modern implementations of the latter class of techniques can benefit from enumerative instantiation.

*Outline.* We first give preliminaries in Sect. 2. Then, we introduce a stronger Herbrand Theorem as the basis for making enumerative instantiation practical so that it can be used in modern systems in Sect. 3. We formalize the different instantiation strategies used by state-of-the-art SMT solvers, discuss their strengths and weaknesses, and present a schematization of how to combine such strategies in Sect. 4, with a focus on a new strategy for enumerative instantiation. An extensive experimental evaluation of enumerative instantiation as implemented in CVC4 is presented in Sect. 5.

### **2 Preliminaries**

We work in the context of many-sorted first-order logic with equality (see e.g. [16]) and assume the reader is familiar with the notions of signature, term, (quantified and ground) formula, atom, literal, free and bound variable, and substitution.

We consider signatures Σ containing a Bool sort and constants -, <sup>⊥</sup> and a family of predicate symbols (<sup>≈</sup> : τ <sup>×</sup> τ <sup>→</sup> Bool) interpreted as equality for each sort τ . Without loss of generality, we assume <sup>≈</sup> is the only predicate in Σ. We use = for syntactic equality. The set of all terms occurring in a formula ϕ (resp. term t) is denoted by **<sup>T</sup>**(ϕ) (resp. **<sup>T</sup>**(t)). We write t ¯ for the sequence of terms <sup>t</sup>1, ..., t*<sup>n</sup>* for an unspecified <sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> that is either irrelevant or deducible from the context.

An *interpretation* is a triple *<sup>M</sup>* = (*D*, *<sup>I</sup>* , *<sup>V</sup>* ) in which *<sup>D</sup>* is a collection of non-empty *domain sets* for all sorts in Σ, *<sup>I</sup>* interprets symbols by mapping them into functions over domain sets according to the symbol sort, and *V* maps free variables to elements of their respective domain sets. A *theory* is a pair *<sup>T</sup>* = (Σ, Ω) in which Σ is a signature and Ω is a class of interpretations denoted the *models of T* . The *empty theory* is the theory for which the class of interpretations Ω is unrestricted, which coincides with first-order logic with equality. Throughout this paper we assume a fixed background theory *T* , which unless otherwise stated is the empty theory. A formula ϕ is *satisfiable* (resp. *unsatisfiable*) *in <sup>T</sup>* if it is satisfied by some (resp. no) interpretation *<sup>M</sup>* <sup>∈</sup> Ω, written *<sup>M</sup>* <sup>|</sup>=*<sup>T</sup>* <sup>ϕ</sup>. A formula <sup>ϕ</sup> *entails in <sup>T</sup>* a formula <sup>ψ</sup>, written <sup>ϕ</sup> <sup>|</sup>=*<sup>T</sup>* <sup>ψ</sup>, if every interpretations in Ω satisfying ϕ also satisfies ψ. For these notions of model satisfaction and entailment in the empty theory, we omit the subscript.

A substitution σ maps variables to terms and its domain, dom(σ), is finite. We write ran(σ) to denote its range. Throughout the paper, conjunctions may be written as sets or tuples, and vice-versa, whenever convenient and unambiguous. All definitions are assumed to be lifted in the expected way from formulas into sets or tuples of formulas.

**Fig. 1.** The SMT instantiation loop for quantified formulas

#### **Instantiation-Based SMT Solvers**

Quantifiers in formulas are generally handled by SMT solvers through instantiation-based techniques, which capitalize on their capability to handle large ground formulas. In this approach, an input formula ψ is given to the ground SMT solver, which will abstract all atoms and quantified formulas and treat them as if they were propositional variables. The solver for ground formulas will provide an assignment E ∪ Q, where E is a set of ground literals and Q is a set of quantified formulas appearing in ψ, such that <sup>E</sup> <sup>∪</sup> <sup>Q</sup> propositionally entails ψ. We assume that all quantified formulas in ψ are of the form <sup>∀</sup>x. ϕ ¯ with ϕ quantifier-free. This can be achieved by prenex form transformation and Skolemization. The instantiation module of the solver will then generate new ground formulas of the form <sup>∀</sup>x. ϕ ¯ <sup>⇒</sup> ϕσ where <sup>∀</sup>x. ϕ ¯ is a quantified formula in <sup>Q</sup> and σ is a substitution from the variables in ϕ to ground terms. These instances will be added conjunctively to the input of the ground solver, hence refining its knowledge of the quantified formulas. The ground solver may then provide another assignment E ∪ Q , where this is a set that entails both ϕ and the newly added instances. This new assignment might either be the previous one, augmented by new ground literals coming from the new instances, or if the previous E has been refuted by the new instances, a completely different set. On the other hand, the process may terminate if the newly added instances suffice to prove the unsatisfiability of the original formula. We will refer to the game between the ground solver that provides assignments for the abstraction of the formula and the instantiation module that provides instances added conjunctively to the formula, as the instantiation loop of the SMT solver (see Fig. 1).

#### **3 Herbrand Theorem and Beyond**

The Herbrand Theorem (see e.g. [16]) for pure first-order logic with equality<sup>1</sup> provides a refutationally complete procedure to check the satisfiability of a formula ψ, or more specifically of a set of literals and quantifiers <sup>E</sup> <sup>∪</sup> <sup>Q</sup>. Indeed, E ∪ Q is satisfiable if and only if E ∪ Q*<sup>g</sup>* is satisfiable, where Q*<sup>g</sup>* is the set of all (Herbrand) instances one can build from the quantifiers in Q by instantiation with the Herbrand universe, i.e. all the possible well-sorted terms built on the signature used in E ∪ Q. Based on this, an instantiation module has a simple refutationally complete strategy for pure first-order logic with equality: it suffices to enumerate Herbrand instances. The major drawback of this strategy is that the Herbrand universe is large. For instance, as soon as there is a function with the range sort also used as an argument, the Herbrand universe is infinite.

<sup>1</sup> The Herbrand Theorem is generally presented in pure first-order logic without equality, but it also holds for equality: it suffices to consider the equality axioms conjunctively with formulas.

Fortunately, a stronger variant of the Herbrand Theorem holds. Using this variant, the instantiation module does not need to consider all possible wellsorted terms (i.e. the full Herbrand universe), but only the terms already available in E ∪ Q, and those subsequently generated.

**Theorem 1.** *Consider the conjunctive sets* E *and* Q *of ground literals and universally quantified clauses respectively where* **T**(E) *contains at least one term of each sort. The set* E ∪ Q *is unsatisfiable in pure first-order logic if and only if there exists a series* Q*<sup>i</sup> of finite sets of instances of* Q *such that*

$$\vdash for\ some\ number\ n,\ the\ finite\ set\ of\ formulas\ \mathsf{E}\cup\bigcup\_{i=1}^{n}\mathsf{Q}\_{i}\ is\ unsatisfable;\ \mathsf{E}-\mathsf{Q}\_{i+1}\subseteq\{\varphi\sigma\ \mid\ \forall\ \bar{x}.\ \varphi\in\mathsf{Q},\ \mathsf{ran}(\sigma)\subseteq\mathsf{T}(\mathsf{E}\cup\bigcup\_{j=1}^{i}\mathsf{Q}\_{j})\}.$$

*Proof.* All proofs for this section are included in [26]. 

The above theorem is stronger than the classical Herbrand theorem in the sense that the set of instances considered above is smaller (or equal) than the set of instances considered in the classical Herbrand theorem. As a trivial example, if a function f appears only in <sup>E</sup> <sup>∪</sup> <sup>Q</sup> in ground terms, no new applications of f are considered. The theorem does not consider all arbitrary terms from the signature, but only those that are generated by the successive instantiations with only already available ground terms. Note the theorem holds for pure first-order logic with equality, and in any theory that preserves the compactness property. It is also necessary however to consider the axioms of the theory for the generation of new terms, that might lead to other instances.

In the Bernays-Sch¨onfinkel-Ramsey fragment of first-order logic (also know as the EPR class) formulas do not contain non constant function symbols, therefore the Herbrand universe of any formula is a finite set. Since the above sets of terms are a subset of the Herbrand universe, the enumeration will always terminate, even when the formula is satisfiable. Therefore, the resulting ground problem is decidable, and the above method comprises a decision procedure for this fragment, just like some variant of model-based quantifier instantiation.

Theorem 1 implies that an instantiation module only has to consider terms occurring within assignments, and not all possible terms. To show refutational completeness (termination on unsatisfiable input) and model soundness (termination without declaring unsatisfiability implies that the input is satisfiable), it is however necessary to account for the successive assignments produced by the ground SMT solver and the consecutive generation of instances. This is achieved using the following lemma.

**Lemma 1.** *Consider the conjunctive sets* E *and* Q *of ground literals and universally quantified clauses respectively where* **T**(E) *contains at least one term of each sort. If there exists an infinite series of finite satisfiable sets of ground literals* E*<sup>i</sup> and of finite sets of ground instances* Q*<sup>i</sup> of* Q *such that*

*–* Q*<sup>i</sup>* = ϕσ | ∀x. ϕ ¯ <sup>∈</sup> <sup>Q</sup>, dom(σ) = {x¯} ∧ ran(σ) <sup>⊆</sup> **<sup>T</sup>**(E*i*) *; –* E<sup>0</sup> = E*,* E*<sup>i</sup>*+1 |= E*<sup>i</sup>* ∪ Q*i;*

*then* E ∪ Q *is satisfiable in the empty theory with equality.*

The above lemma has two direct consequences on the instantiation loop of SMT solvers, where instances are generated from the set of available terms in the ground assignment provided by the ground SMT solver. The following two corollaries state the model soundness and the refutational completeness of the instantiation loop respectively.

**Corollary 1.** *Given a formula* ψ*, if there exists a satisfiable set of literals* <sup>E</sup> *and a set of quantified clauses* <sup>Q</sup> *such that* <sup>E</sup> <sup>∪</sup> <sup>Q</sup> <sup>|</sup><sup>=</sup> ψ *and the instantiation module of the SMT solver cannot generate any new instance, i.e.* E *already entails all instances of* <sup>Q</sup> *for substitutions built with terms* **<sup>T</sup>**(E)*, then* ψ *is satisfiable.*

*Proof.* A formal statement of the corollary and a proof is available in [26]. 

**Corollary 2.** *Given an unsatisfiable formula, if the generation of instances is fair the instantiation loop of the SMT solver terminates.*

*Proof.* A formal statement of the corollary and a proof is available in [26]. 


**Fig. 2.** Quantifier Instantiation strategies: Conflict-based Instantiation (**c**), E-matching instantiation (**e**), Model-based Instantiation (**m**) and Enumerative Instantiation (**u**).

### **4 Quantifier Instantiation in CDCL(***T* **)**

This section overviews recent techniques used by SMT solvers for quantifier instantiation, and comments on their relative strengths and weaknesses. We will focus on enumerative quantifier instantiation, a technique which has received little attention in recent work, but has several compelling advantages with respect to current techniques.

**Definition 1 (Instantiation Strategy).** *An instantiation strategy takes as input:*


*It outputs a set of substitutions* {σ1,...,σ*<sup>n</sup>*} *where* dom(σ*<sup>i</sup>*)=¯x *for each* i <sup>=</sup> <sup>1</sup>,...,n*.*

Figure 2 gives four instantiation strategies used by modern SMT solvers, each that have the interface given in Definition 1. The first three have been described in detail in previous works (see [25] for a recent overview). We briefly review these techniques in this section. The fourth, enumerative quantifier instantiation, is the subject of this paper.

Conflict-based instantiation (**c**) was introduced in [28] as a technique for improving the performance of SMT solvers for unsatisfiable problems. In this strategy, we return a substitution σ such that ϕσ together with <sup>E</sup> is unsatisfiable, We refer to ϕσ as a *conflicting instance* (for <sup>E</sup>). Typical implementations of this strategy do not insist that a conflicting instance be returned if one exists, and hence the strategy may choose to return the empty set of substitutions. Recent work [4,5] gives a strategy for conflict-based instantiation that has refutational completeness guarantees for the empty theory with equality, that is, when a conflict instance exists for a quantified formula in this theory, the strategy is guaranteed to return it.

E-matching instantiation (**e**) is the most commonly used strategy for quantifier instantiation in modern SMT solvers [13,15,18]. In this strategy, we first heuristically choose a set of *triggers* for a quantified formula <sup>∀</sup>x. ϕ ¯ , where a trigger is a tuple of terms whose free variables are ¯x. In practice, triggers can be selected using user-provided annotations, or selected automatically by the SMT solver. For each trigger t ¯*i*, we select a set of substitutions <sup>S</sup>*<sup>i</sup>* such that for each <sup>σ</sup> in this set, <sup>E</sup> entails that t ¯*<sup>i</sup>*<sup>σ</sup> is equal to a tuple of ground terms <sup>g</sup>*<sup>i</sup>* in <sup>E</sup>. We return the union of these sets <sup>S</sup>*<sup>i</sup>* for each selected trigger. E-matching instantiation is generally incomplete, but works well in practice for unsatisfiable problems, and hence is a key component of most SMT solvers that support quantified formulas.

Model-based quantifier instantiation (**m**) was introduced in [19], and has also been used for improving the performance of finite model finding [29]. In this strategy, we first construct a model *M* for the quantifier-free portion of our input E, where typically the interpretations of functions for values not constrained by E are chosen heuristically. Notice that *M* does not necessarily satisfy the quantified formula <sup>∀</sup>x. ϕ ¯ . If it does not, we return a single substitution σ for which *<sup>M</sup>* does not satisfy ϕσ, where typically σ maps variables from ¯x to terms that occur in **T**(E). With respect to conflict-based and E-matching instantiation, model-based quantifier instantiation has the advantage that it is model sound: when it returns <sup>∅</sup>, then <sup>E</sup> ∪ {∀x. ϕ ¯ } is satisfiable.

This paper revisits enumerative quantifier instantiation (**u**) as a viable alternative to model-based quantifier instantiation. In this strategy, we assume an ordering on quantifier-free terms. This ordering is not related to the usual term ordering one generally uses for saturation theorem proving, but rather determines which instance will be generated first. The strategy returns the substitution {x¯ → t ¯}, where t ¯ is the minimal tuple of terms with respect to from **<sup>T</sup>**(E) such that ϕ{x¯ → t ¯} is not entailed by <sup>E</sup>. We refer to this strategy as enumerative instantiation since in the worst case it generates instantiations by enumerating tuples of all terms of the proper sort from E, according to the ordering . In practice, the number of instantiations produced by this strategy is kept small by interleaving it with other strategies like **c** or **e**, or due to the fact that a small number of instances may already allow the SMT solver to conclude the input is unsatisfiable. Moreover, thanks to the results in Sect. 3, this strategy is refutationally complete and model sound for quantified formulas in the empty theory with equality.

*Example 1.* Consider the set of ground literals <sup>E</sup> <sup>=</sup> {¬P(a),¬P(b), P(c),¬R(b)}. For the input (E, <sup>∀</sup>x. P(x) <sup>∨</sup> R(x)), the strategies in this section will do the following.


In the previous example, clearly {x → b} is the most useful substitution, since it leads to an instance P(b) <sup>∨</sup> R(b) which together with <sup>E</sup> is unsatisfiable. The substitution {x → c} is definitely not a useful substitution, since it is already entailed by P(c) <sup>∈</sup> <sup>E</sup>. The substitution {x → a} is potentially useful since it forces the solver to satisfy P(a) <sup>∨</sup> R(a). Here, we point out that the effect of enumerative instantiation and model-based instantiation is essentially the same, as both return an instance that is not entailed by E. However, the substitutions produced by enumerative instantiation often have advantages with respect to model-based instantiation on unsatisfiable problems.

*Example 2.* Consider the set of ground literals <sup>E</sup> <sup>=</sup> {¬P(a), R(b), S(c)} and the quantified clauses <sup>Q</sup> <sup>=</sup> {∀x. R(x) <sup>∨</sup> S(x), <sup>∀</sup>x. <sup>¬</sup>R(x) <sup>∨</sup> P(x), <sup>∀</sup>x. <sup>¬</sup>S(x) <sup>∨</sup> P(x)} in a mono-sorted signature. Notice that E ∪ Q is unsatisfiable: it suffices to consider the instances of the three quantified formulas in <sup>Q</sup> with x → a. On such an input, model-based instantiation will first construct a model for E. Assume this model *<sup>M</sup>* is such that <sup>P</sup> *<sup>M</sup>* <sup>=</sup> λx. <sup>⊥</sup>, R*<sup>M</sup>* <sup>=</sup> λx. ite(x <sup>≈</sup> b, -, <sup>⊥</sup>), and <sup>S</sup>*<sup>M</sup>* <sup>=</sup> λx. ite(x <sup>≈</sup> c, -, <sup>⊥</sup>). Assuming enumerative instantiation chooses the lexicographic extension of a term ordering where a <sup>≺</sup> b <sup>≺</sup> c. The following table summarizes the result of running the two strategies.


The second and third columns show the sets of possible values of x that are considered with model-based and enumerative instantiation respectively, and the third and fourth columns show one possible selection. The instances corresponding to the three substitutions returned by enumerative instantiation R(a)∨S(a), <sup>¬</sup>R(a)∨P(a) and <sup>¬</sup>S(a)∨P(a) when conjoined with <sup>¬</sup>P(a) from <sup>E</sup> are unsatisfiable, whereas the instances produced by model-based instantiation do not suffice to show that E is unsatisfiable. Hence, the latter will consider an extension of <sup>E</sup> that satisfies the instances R(a) <sup>∨</sup> S(a), <sup>¬</sup>R(b) <sup>∨</sup> P(b) and <sup>¬</sup>S(c) <sup>∨</sup> P(c) and guess another model for this extension. 

A key observation is that useful instantiations can be obscured by guesses made when constructing models *<sup>M</sup>*. Here, since we decided <sup>R</sup>(a)*<sup>M</sup>* <sup>=</sup> <sup>⊥</sup>, the substitution {x → a} was not considered when applying model-based instantiation to the second quantified formula, and since <sup>S</sup>(a)*<sup>M</sup>* <sup>=</sup> <sup>⊥</sup>, the substitution {x → a} was not considered when applying it to the third. In implementations of model-based instantiation, certain values in models are chosen heuristically, leading to this behavior. This is done out of necessity, since determining whether there exists a model that satisfies quantified formulas, even for a fixed context, is a challenging problem.

On the other hand, the range of substitutions considered by enumerative instantiation in the previous example include all terms that correspond to instances that are not entailed by E. The substitutions it considers are "minimally diverse", that is, in the previous example they introduce new predicates on term a only, whereas model-based instantiation introduces new predicates on a, b and c. Reducing the number of new terms introduced by instantiations can have a significant positive impact on performance in practice. Furthermore, enumerative instantiation has the advantage that a term ordering allows finegrained heuristics better suited for unsatisfiable problems, which we comment on in Sect. 4.1.

*Example 3.* Consider the sets <sup>E</sup> <sup>=</sup> {a ≈ b, b ≈ c, a ≈ c} and <sup>Q</sup> <sup>=</sup> {∀x.P(x)}. For the input (E, <sup>∀</sup>x.P(x)), model-based quantifier instantiation will first construct a model *<sup>M</sup>* for <sup>E</sup>, where assume that <sup>P</sup> *<sup>M</sup>* <sup>=</sup> λx.-. It is easy to see *<sup>M</sup>* <sup>|</sup><sup>=</sup> ϕ{x → t} for a, b, c <sup>∈</sup> **<sup>T</sup>**(E), and hence it returns the empty set of substitutions, indicating that E ∪ Q is satisfiable. On the other hand, assume enumerative instantiation chooses the lexicographic extension of a term ordering where a <sup>≺</sup> b <sup>≺</sup> c. Since <sup>E</sup> |<sup>=</sup> P(a) and a is smaller than b and c according to , **<sup>u</sup>**(E, P(x)) returns the set containing {x → a}. Subsequently and for similar reasons, two more iterations of this strategy will be invoked, resulting in the instances P(b) and P(c) before it terminates with the empty set. 

In this example, model-based instantiation was able to terminate on the first iteration, since it guessed the correct interpretation for P, whereas enumerative instantiation considered substitutions mapping x to each ground term a, b, c from E. For this reason, model-based instantiation is typically better suited for satisfiable problems.

#### **4.1 Implementing Enumerative Instantiation**

We comment on several important details concerning the implementation of enumerative quantifier instantiation in the SMT solver CVC4.

*Term Ordering.* Given a term ordering , CVC4 considers the extension to tuples of terms such that:

$$(t\_1, \ldots, t\_n) \prec (s\_1, \ldots, s\_n) \text{ if } \begin{cases} \max\_{i=1}^n t\_i \prec \max\_{i=1}^n s\_i, \text{ or }\\ \max\_{i=1}^n t\_i = \max\_{i=1}^n s\_i \text{ and } (t\_1, \ldots, t\_n) \prec\_{\text{lex}} (s\_1, \ldots, s\_n) \end{cases}$$

where <sup>≺</sup>lex is the lexicographic extension of <sup>≺</sup>. For example, if <sup>a</sup> <sup>≺</sup> <sup>b</sup> <sup>≺</sup> <sup>c</sup>, then we have that (a, a) <sup>≺</sup> (a, b) <sup>≺</sup> (b, a) <sup>≺</sup> (b, b) <sup>≺</sup> (a, c) <sup>≺</sup> (c, b) <sup>≺</sup> (c, c). By this ordering, we consider substitutions involving c only after all combinations of substitutions involving a and b are considered. This choice is important since it leads to instantiations that introduce fewer terms, and are thus more likely to lead to conflicts at the ground level.

The underlying term ordering is determined dynamically based on the current set of assertions E. At all times, we maintain a finite list of quantifier-free terms such that we have fixed the ordering <sup>t</sup><sup>1</sup> <sup>≺</sup> ... <sup>≺</sup> <sup>t</sup>*<sup>n</sup>*. Then, if all combinations of instantiations for <sup>t</sup><sup>1</sup>,...,t*<sup>n</sup>* are currently entailed by <sup>E</sup>, we choose a term <sup>t</sup> <sup>∈</sup> **<sup>T</sup>**(E) that is such that <sup>E</sup> |<sup>=</sup> <sup>t</sup> <sup>≈</sup> <sup>t</sup>*<sup>i</sup>* for <sup>i</sup> = 1,...,n if one exists, and append it to our ordering so that <sup>t</sup>*<sup>n</sup>* <sup>≺</sup> <sup>t</sup>. The particular choice of <sup>t</sup> beyond this criteria is arbitrary. An experimental evaluation of more sophisticated term orderings, such as those inspired by first-order automated theorem proving [2] is the subject of future work.

*Entailment Checks.* For a set of ground equalities and disequalities E, quantified formula <sup>∀</sup>x.ϕ ¯ and substitution {x¯ → t ¯}, CVC4 implements a two-layered method for checking whether the entailment <sup>E</sup> <sup>|</sup><sup>=</sup> ϕ{x¯ → t ¯} holds. First, we maintain a cache of instantiations that have already been returned on previous iterations. Hence if <sup>E</sup> satisfies a set of formulas containing ϕ{x¯ → s¯}, where <sup>E</sup> <sup>|</sup><sup>=</sup> t ¯ <sup>≈</sup> s¯, then the entailment clearly holds.

Second, we use an incomplete and fast method for inferring when an entailment holds. We first compute from E congruence classes over **T**(E). For each t <sup>∈</sup> **<sup>T</sup>**(E), let [t] be the representative of term t in this equivalence relation. For each function <sup>f</sup>, we use a *term index* data structure *<sup>I</sup><sup>f</sup>* that stores an entry of the form ([t<sup>1</sup>],..., [t*<sup>n</sup>*]) <sup>→</sup> [f(t1,...,t*<sup>n</sup>*)] <sup>∈</sup> *<sup>I</sup><sup>f</sup>* for each uninterpreted function application <sup>f</sup>(t1,...,t*<sup>n</sup>*) <sup>∈</sup> **<sup>T</sup>**(E). To check the entailment of <sup>E</sup> <sup>|</sup><sup>=</sup> where is a literal, we update based on the iterative process until a fixed point is reached:


Then, if the resultant ψ is -, then the entailment holds. Although not shown here, the above process is extended in a straightforward way to handle Boolean structure, and also can be extended in the presence of other background theories in a straightforward way by incorporating theory-specific rewriting steps.

*Restricting Enumeration Space.* Enumerative instantiation can be refined further by noticing that only a subset of the set of terms **<sup>T</sup>**(E) will ever be relevant for showing unsatisfiability of a quantified formula. An approach in this spirit was used by Ge and de Moura [19], where decidable fragments were identified by noticing that the *relevant domains* of quantified formulas in these fragments are guaranteed to be finite. In that work, the relevant domain of a quantified formula <sup>∀</sup>x. ψ ¯ is computed based on the terms in <sup>E</sup> and the structure of its body ψ. For example, t is in the relevant domain of function f for all ground terms f(t), the relevant domain of x for a quantified formula containing the term f(x) is equal to the relevant domain of f, and so on. A related approach is to use *sort inference* [8,9,22], to compute more precise sort information and thus decrease the number of possible instantiations.

*Example 4.* Say <sup>E</sup>∪<sup>Q</sup> <sup>=</sup> {a ≈ b, f(a) <sup>≈</sup> c} ∪ {∀x. P(f(x))}, where a, b, c, x are of sort τ , f is a unary function τ <sup>→</sup> τ , and P is a predicate on τ . It can be shown that <sup>E</sup> <sup>∪</sup> <sup>Q</sup> is equivalent to <sup>E</sup>*<sup>s</sup>* <sup>∪</sup> <sup>Q</sup>*<sup>s</sup>* <sup>=</sup> {a<sup>1</sup> ≈ <sup>b</sup><sup>1</sup>, f<sup>12</sup>(a<sup>1</sup>) <sup>≈</sup> <sup>c</sup><sup>2</sup>}∪{P<sup>2</sup>(f<sup>12</sup>(x<sup>1</sup>))}, where <sup>a</sup><sup>1</sup>, b<sup>1</sup>, <sup>x</sup><sup>1</sup> are of sort <sup>τ</sup><sup>1</sup>, <sup>c</sup><sup>2</sup> is of sort <sup>τ</sup><sup>2</sup>, <sup>f</sup><sup>12</sup> is of sort <sup>τ</sup><sup>1</sup> <sup>→</sup> <sup>τ</sup><sup>2</sup>, and <sup>P</sup><sup>2</sup> is a predicate on <sup>τ</sup><sup>2</sup>. 

Sorts can be inferred in this manner using a linear traversal on the input formula (for details, see for instance Sect. 4 of [22]). This technique narrows the set of terms considered by enumerative instantiation. In the above example, whereas enumerative instantiation for <sup>E</sup> <sup>∪</sup> <sup>Q</sup> might consider the substitutions {x → c} or {<sup>x</sup> → <sup>f</sup>(c)}, for <sup>E</sup>*<sup>s</sup>* <sup>∪</sup> <sup>Q</sup>*<sup>s</sup>* it would not consider {x<sup>1</sup> → <sup>c</sup><sup>2</sup>} since their sorts are different, nor would it consider {x<sup>1</sup> → <sup>f</sup><sup>12</sup>(c<sup>2</sup>)} since <sup>f</sup><sup>12</sup>(c<sup>2</sup>) is not a well-sorted term. Moreover, the Herbrand universe of an inferred subsort may be finite when the universe of its parent sort is infinite. In the above example, the Herbrand universe of <sup>τ</sup><sup>1</sup> is {a1, b<sup>1</sup>} and <sup>τ</sup><sup>2</sup> is {f<sup>12</sup>(a<sup>1</sup>), f<sup>12</sup>(b<sup>1</sup>), c<sup>2</sup>}, whereas the Herbrand universe of τ is infinite.

*Compound Strategies.* Since the instantiation strategies from this section have their respective strengths and weaknesses, it is valuable to combine them. We consider two ways of combining strategies which we refer as *priority* instantiation and *interleaved* instantiation. For base strategies **s<sup>1</sup>** and **s2**, priority instantiation (**s1**; **s2**) first invokes **s1**. If this strategy returns a non-empty set of substitutions, it returns that set, otherwise it returns the instances returned by **s2**. On the other hand, interleaved instantiation (**s1**+**s2**) returns the union of the substitutions returned by the two strategies.

Enumerative instantiation is the most effective when used as a complement to heuristic strategies. In particular, we will see in the next section that the strategies **c**;**e;u** and **c**;**e+u** are the most effective strategies for unsatisfiable problems in CVC4.

### **5 Experiments**

This section reports on our experimental evaluation of different strategies based on enumerative instantiation as implemented in the SMT solver CVC4.<sup>2</sup> We present an extensive analysis of enumerative instantiation and compare it with implementations of model-based instantiation on both unsatisfiable and satisfiable benchmarks. Experiments were performed on untyped first-order benchmarks from the TPTP library [33] <sup>3</sup>, version 6.4.0, and from SMT-LIB [7], as of October 2017, on logics having quantifiers and either uninterpreted functions or arrays. For the latter, we considered also logics containing other theories such as

<sup>2</sup> For details, see http://matryoshka.gforge.inria.fr/pubs/fol enumerative inst/.

<sup>3</sup> In SMT parlance, the logic of these benchmarks is quantified EUF.

**Fig. 3.** CVC4 configurations on unsatisfiable benchmarks with a 300 s timeout.

arithmetic and datatypes. Some benchmarks are solved by all considered configurations of solvers in less than 0.1 s. We discarded those 25 580 benchmarks. In total, 42 065 problems were selected, 14 731 from TPTP and 27 334 from SMT-LIB. All results were produced on StarExec [32], a public execution service for running comparative evaluations of solvers, with a timeout of 300 s.

We follow the convention in Sect. 4 for identifying configurations based on their instantiation strategy. All configurations of CVC4 use conflict-based instantiation [5,28] with highest priority, so we omit the prefix "**c;**" from the names of CVC4 configurations e.g. **e**+**u** in fact means **c**;**e+u**. Sort inference, as discussed in Sect. 4.1, is also used by all configurations of CVC4.

#### **5.1 Impact of Enumerative Instantiation in CVC4**

In this section, we highlight the impact of enumerative instantiation in CVC4 for unsatisfiable benchmarks. Where applicable, we contrast the difference in the impact of enumerative instantiation and model-based instantiation on the performance of CVC4 on unsatisfiable benchmarks.<sup>4</sup>

<sup>4</sup> There are technical details that influence the comparison of these techniques (see [26]).

The comparison of various instantiation strategies supported by CVC4 is summarized in Fig. 3. In the table, each row is dedicated to a library and logic. SMT-LIB is shown in more granularity than TPTP to highlight comparisons of individual strategies. The first column identifies the subset and the second shows its total number of benchmarks. The next seven columns show the number of benchmarks found to be unsatisfiable by each configuration. The last three columns show the results of virtual portfolio solvers, with **uport** combining **e**, **u**, **e**;**u**, and **e**+**u**; and **mport** combining **e**, **m**, **e**;**m**, and **e**+**m**; while **port** combines all seven configurations.

First, we can see that **u** outperforms **m**, as it solves 3 043 more benchmarks overall. While this is not close to the performance of E-matching (**e**), it should be noted that **u** is highly orthogonal to **e**, solving 1 737 benchmarks that could not be solved by **e**<sup>5</sup>. Combining **e** with either **u** or **m**, using either priority or interleaving instantiation, leads to significant gains in performance. Overall the best configuration is **e**+**u**, that is, the interleaving of enumerative instantiation and E-matching, which solves 20 535 benchmarks, that is, 253 more than its counterpart **e**+**m** interleaving model-based instantiation and E-matching, and 1 295 more than E-matching alone. In the UFLIA logic, the enumerative techniques are specially effective in comparison with the model-based ones. In particular, they enable CVC4 to solve previously intractable problems, e.g. the family "sexpr" with 32 problems. These are notoriously hard problems involving the verification of C# programs using Spec# [6]. Z3 can solve 31 of them thanks to its advanced optimizations of E-matching [13]. CVC4 previously could solve at most 16 using techniques combining **e** and **m**, but **u** alone could solve 27, and all of 32 are solved by **e**+**u**. Another example is the family "vcc-havoc" in UFNIA, stemming from the verification of concurrent C with VCC [10]. The strategy **e**+**u** solves 940 out of 984 problems, outperforming **e** and its combinations with **m**, which solve at most 860 problems<sup>6</sup>.

The portfolio columns of the table in Fig. 3 highlight the improvement due to enumerative instantiation for CVC4 on the number of solved problems: there are 712 more problems overall solved when adding enumerative instantiation in the strategies (see columns **mport** and **port**). The cactus plot of Fig. 3 shows that while the priority strategies are initially quicker, the interleaving ones scale better, solving more hard problems than their priority counterparts. Overall, we conclude that in addition to being much simpler to implement<sup>7</sup> instantiation strategies that combine E-matching with enumerative instantiation in CVC4 have a noticeable advantage over those that combine E-matching with modelbased instantiation on unsatisfiable problems.

<sup>5</sup> Number of uniquely solved benchmarks between configurations are available in [26].

<sup>6</sup> A detailed comparison by families can be seen in [26].

<sup>7</sup> As a rough estimate, the implementation of enumerative instantiation in CVC4 is around 500 lines of code, whereas model-based instantiation is around 4500 lines of code.

#### **5.2 Comparison Against Other SMT Solvers**

In this section, we compare our implementation of enumerative instantiation in CVC4 against another state-of-the-art SMT solver: Z3 [14] (version 4.5.1) which, like CVC4, also relies on E-matching instantiation for handling unsatisfiable problems. Before making the comparison, we first summarize the main differences between Z3 and CVC4 here. Z3 uses several optimizations for Ematching that are not implemented in CVC4, including the use of code trees and techniques for applying instantiation incrementally during the CDCL(*T* ) search (see Sect. 5 of [13]). It also implements techniques for removing previously considered instantiations from its set of known clauses (see Sect. 7 of [13]). The main advantage of CVC4 with respect to Z3 is its use of conflict-based instantiation **c** [28], which is enabled by default in all strategies we considered. It also supports interleaved instantiation strategies as described in Sect. 4.1, whereas Z3 does not. In addition to these differences, Z3 implements model-based instantiation **m** as described in [19], whereas CVC4 implements model-based instantiation as described in [29]. Finally, CVC4 implements enumerative instantiation as described in this paper, which we compare as an alternative to these implementations.

**Fig. 4.** Z3 and CVC4 on unsatisfiable benchmarks with a 300 s timeout.

Figure 4 summarizes the performance of Z3 on our benchmark set. First, like CVC4, using model-based instantiation to complement E-matching leads to significant gains in Z3, as **z3 e**;**m** solves a total of 1731 more benchmarks than solved by E-matching alone **z3 e**. In comparison with CVC4, the configuration **z3 e** outperforms **e** in the logics with non-linear arithmetic and other theories, while **e** is better in the others. Finally, Z3's implementation of model-based quantifier instantiation by itself **z3 m** is not effective for unsatisfiable benchmarks, solving only 8951 overall.

To further compare Z3 and CVC4, the third column from the left is the number of benchmarks solved by CVC4's E-matching strategy (**e**), which we gave in Fig. 3. The second to last column **uport-i** gives the number of benchmarks solved by at least one of **u**, **e**, or **e**;**u** in CVC4, where we intentionally omit the interleaved strategy **e**+**u**, since Z3 does not support a similar strategy. The column **mport-i** is computed similarly. We compare these with the fifth column, **z3 mport-i**, i.e. the number of benchmarks solved by either **z3 m**, **z3 e** or **z3 e**;**m**. A comparison of these is given in the cactus plot of Fig. 4. We can see that due to Z3's highly optimized implementations, **z3 mport-i** solves the highest number of problems in less than one second (around 13000), whereas the portfolio strategies of CVC4 solve more for larger timeouts. Overall, the best portfolio strategy is enumerative instantiation in CVC4, which solves a total of 21305 unsatisfiable benchmarks overall, which is 1965 more benchmarks than **z3 mport-i**, and 470 more benchmarks than **mport-i**. We thus conclude that the use of enumerative instantiation when paired with E-matching and conflictbased instantiation in CVC4 improves the state-of-the-art of instantiation-based SMT solvers for unsatisfiable benchmarks.

*Comparison with Automated Theorem Provers.* Automated theorem provers like Vampire [23] and E [31] use substantially different techniques based on superposition, hence we do not provide an extensive comparison here. However, we do remark that the gains provided by enumerative instantiation were one of the main reasons for CVC4 being more competitive in the 2017 CASC competition of automatic theorem provers [34]. CVC4 placed third in the category with unsatisfiable problems on the empty theory, as in previous years, behind superposition-based theorem provers Vampire and E, which implement complete strategies. There was, however, a 23% reduction in the number of problems that E solves and CVC4 does not, w.r.t. the previous competition, reducing the gap between the two systems.

*Satisfiable Benchmarks.* For satisfiable benchmarks<sup>8</sup>, **m** solves 1350 benchmarks across all theories. As expected, this is much higher than the number solved by

<sup>8</sup> For further details, see [26].

**u**, which solves 510 benchmarks, all from the empty theory. Nevertheless, there are 13 satisfiable problems solved by **u** and not by **m**, which shows that enumerative instantiation has some orthogonality on satisfiable benchmarks as well. We conclude that enumeration not only has superior performance to MBQI on unsatisfiable benchmarks, but also can be an alternative for satisfiable benchmarks in the empty theory.

### **5.3 Artifact**

We have produced an artifact [27] to reproduce the experimental results presented in this paper. The artifact contains the binaries of the SMT solvers CVC4 and Z3, the benchmarks on which they were evaluated, and the running scripts for each configuration evaluated. Detailed instructions are given to perform tests on the various benchmark families with all configurations within the time limits, as well as for retrieving the respective results in CSV format. The artifact has been tested in the virtual machine available at [21].

### **6 Conclusion**

We have presented a strengthening of the Herbrand Theorem, and used it to devise an efficient technique for enumerative instantiation. The implementation of this technique in the state-of-the-art SMT solver CVC4 increases its success rate and outperforms existing implementations of MBQI on unsatisfiable problems with quantified formulas. Given its relatively simple implementation, this technique is well poised as an alternative to MBQI for being integrated in an instantiation based SMT solver to achieve completeness in first-order logic with the empty theory and equality, as well as perform improvements also when theories are considered.

Future work includes further restricting the enumeration space, for instance with ordering criteria in the spirit of resolution-based theorem proving [3]. Another direction is lifting the techniques seen here to reasoning in higher-order logic. To handle quantification over functions it is often necessary to enumerate expressions, and so performing such an enumeration in a principled manner is paramount for this domain. Techniques from syntax-guided function synthesis [1] could be combined with enumerative instantiation to pursue this goal.

**Data Availability Statement and Acknowledgments.** The datasets generated and analyzed during the current study are available in the figshare repository: https:// doi.org/10.6084/m9.figshare.5917384.v1.

This work was partially funded by the National Science Foundation under Award 1656926, by the H2020-FETOPEN-2016-2017-CSA project SC<sup>2</sup> (712689), and by the European Research Council (ERC) starting grant Matryoshka (713999). We would like to thank the anonymous reviewers for their comments. We are grateful to Jasmin C. Blanchette for discussions, encouragements and financial support through his ERC grant.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Non-linear Arithmetic Procedure for Control-Command Software Verification**

Pierre Roux1(B), Mohamed Iguernlala2,3, and Sylvain Conchon3,4

 ONERA, DTIS, 31055 Toulouse, France pierre.roux@onera.fr OCamlPro SAS, 91190 Gif-sur-Yvette, France LRI, Universit´e Paris-Sud, 91405 Orsay, France INRIA Saclay – Ile-de-France, Toccata, 91893 Orsay, France

**Abstract.** State-of-the-art (semi-)decision procedures for non-linear real arithmetic address polynomial inequalities by mean of symbolic methods, such as quantifier elimination, or numerical approaches such as interval arithmetic. Although (some of) these methods offer nice completeness properties, their high complexity remains a limit, despite the impressive efficiency of modern implementations. This appears to be an obstacle to the use of SMT solvers when verifying, for instance, functional properties of control-command programs.

Using off-the-shelf convex optimization solvers is known to constitute an appealing alternative. However, these solvers only deliver approximate solutions, which means they do not readily provide the soundness expected for applications such as software verification. We thus investigate a-posteriori validation methods and their integration in the SMT framework. Although our early prototype, implemented in the Alt-Ergo SMT solver, often does not prove competitive with state of the art solvers, it already gives some interesting results, particularly on controlcommand programs.

**Keywords:** SMT · Non-linear real arithmetic Polynomial inequalities · Convex optimization

### **1 Introduction**

Systems of non-linear polynomial constraints over the reals are known to be solvable since Tarski proved that the first-order theory of the real numbers is decidable, by providing a quantifier elimination procedure. This procedure has then been much improved, particularly with the cylindrical algebraic decomposition. Unfortunately, its doubly exponential complexity remains a serious limit to its

This work has been partially supported by the French ANR projects ANR-12-INSE-0007 Cafein and ANR-14-CE28-0020 Soprano and the project SEFA IKKY.

scalability. It is now integrated into SMT solvers [23]. Although it demonstrates very good practical results, symbolic quantifier elimination seems to remain an obstacle to scalability on some problems. In some cases, branch and bound with interval arithmetic constitutes an interesting alternative [17].

We investigate the use of numerical optimization techniques, called semidefinite programming, as an alternative. We show in this paper how solvers based on these techniques can be used to design a sound semi-decision procedure that outperforms symbolic and interval-arithmetic methods on problems of practical interest. A noticeable characteristic of the algorithms implemented in these solvers is to only compute approximate solutions.

We explain this by making a comparison with linear programming. There are two competitive methods to optimize a linear objective under linear constraints: the interior point and the simplex algorithms. The interior point algorithm starts from some initial point and performs steps towards an optimal value. These iterations converge to the optimum but not in finitely many steps and have to be stopped at some point, yielding an approximate answer. In contrast, the simplex algorithm exploits the fact that the feasible set is a polyhedra and that the optimum is achieved on one of its vertices. The number of vertices being finite, the optimum can be exactly reached after finitely many iterations. Unfortunately, this nice property does not hold for spectrahedra, the equivalent of polyhedra for semi-definite programming. Thus, all semi-definite programming solvers are based on the interior-point algorithm, or a variant thereof.

To illustrate the consequences of these approximate solutions, consider the proof of <sup>e</sup> <sup>≤</sup> <sup>c</sup> with <sup>e</sup> a complicated ground expression and <sup>c</sup> a constant. <sup>e</sup> <sup>≤</sup> <sup>c</sup> can be proved by exactly computing e, giving a constant c , and checking that <sup>c</sup> <sup>≤</sup> <sup>c</sup>. However, if <sup>e</sup> is only approximately computed: <sup>e</sup> <sup>∈</sup> [c −-, c +-], this is conclusive only when c + - <sup>≤</sup> <sup>c</sup>. In particular, if <sup>e</sup> is equal to <sup>c</sup>, an exact computation is required. This inability to prove inequalities that are not satisfied with some margin is a well known property of numerical verification methods [42] which can then be seen as a trade-off between completeness and computation cost.

The main point of this paper is that, despite their incompleteness, numerical verification methods remain an interesting option when they enable to practically solve problems for which other methods offer an untractable complexity. Our contributions are:


The rest of this paper is organized as follows: Sect. 2 gives a practical example of a polynomial problem, coming from control-command program verification, better handled by numerical methods. Section 3 is dedicated to preliminaries. It introduces basic concepts of sum of squares polynomials and semi-definite programming. In Sect. 4, we compare two methods to derive sound solutions to polynomial problems from approximate answers of semi-definite programming

```
typedef struct { double x0 , x1, x2; } state;
/*@ predicate inv( state *s) = 6.04 * s->x0 * s->x0 - 9.65 * s->x0 * s->x1
  @ - 2.26 * s->x0 * s->x2 + 11.36 * s->x1 * s->x1
  @ + 2.67 * s->x1 * s->x2 + 3.76 * s->x2 * s->x2 <= 1; */
/*@ requires \valid(s) && inv(s) && -1 <= in0 <= 1;
  @ ensures inv(s); */
void step(state *s, double in0) {
  double pre_x0 = s->x0 , pre_x1 = s->x1, pre_x2 = s->x2;
  s->x0 = 0.9379 * pre_x0 - 0.0381 * pre_x1 - 0.0414 * pre_x2 + 0.0237 * in0;
  s->x1 = -0.0404 * pre_x0 + 0.968 * pre_x1 - 0.0179 * pre_x2 + 0.0143 * in0;
  s->x2 = 0.0142 * pre_x0 - 0.0197 * pre_x1 + 0.9823 * pre_x2 + 0.0077 * in0;
}
```
**Fig. 1.** Example of a typical control-command code in C.

solvers. Section 5 provides some implementation details and discuss experimental results. Finally, Sect. 6 concludes with some related and future works.

#### **2 Example: Control-Command Program Verification**

Control-command programs usually iterate linear assignments periodically over time. These assignments take into account a measure (via some *sensor* ) of the state of the physical system to control (called *plant* by control theorists) to update an internal state and eventually output orders back to the physical system (through some *actuator* ). Figure 1 gives an example of such an update, in0 being the input and s the internal state. The comments beginning by @ in the example are annotations in the ACSL language [12]. They specify that before the execution of the function (requires) s must be a valid pointer satisfying the predicate inv and <sup>|</sup>in0| ≤ 1 must hold. Under these hypotheses, <sup>s</sup> still satisfies inv after executing the function (ensures).

To prove that the internal state remains bounded over any execution of the system, a quadratic polynomial<sup>1</sup> can be used as invariant<sup>2</sup>. Checking the validity of these invariants then leads to arithmetic verification conditions (VCs) involving quadratic polynomials. Such VCs can for instance be generated from the program of Fig. 1 by the Frama-C/Why3 program verification toolchain [12,16]. Unfortunately, proving the validity of these VCs seem out of reach for current state-of-the-art SMT solvers. For instance, although Z3 [13] can solve smaller examples with just two internal state variables in a matter of seconds, it ran for a few days on the three internal state variable example of Fig. 1 without reaching a conclusion<sup>3</sup>. In contrast, our prototype can prove it in a fraction of second, as well as other examples with up to a dozen variables.

<sup>1</sup> For instance, the three variables polynomial in *inv* in Fig. 1.

<sup>2</sup> Control theorists call these invariants sublevel sets of a quadratic Lyapunov function. Such functions exist for linear systems if and only if they do not diverge.

<sup>3</sup> This is the case even on a simplified version with just arithmetic constructs, i.e., expurgated of all the reasoning about pointers and the C memory model.

Verification of control-command programs is a good candidate for numerical methods. These systems are designed to be robust to many small errors, which means that the verified properties are usually satisfied with some margin. Thus, the incompleteness of numerical methods is not an issue for this kind of problems.

### **3 Preliminaries**

#### **3.1 Emptiness of Semi-algebraic Sets**

Our goal is to prove that conjunctions of polynomial inequalities are unsatisfiable, that is, given some polynomials with real coefficients <sup>p</sup>1,...,p<sup>m</sup> <sup>∈</sup> R[x], we want to prove that there does not exist any assignment for the <sup>n</sup> variables <sup>x</sup>1,...,x<sup>n</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> such that all inequalities <sup>p</sup>1(x1,...,xn) <sup>≥</sup> <sup>0</sup>,...,pm(x1,...,xn) <sup>≥</sup> 0 hold simultaneously. In the rest of this paper, the notation <sup>p</sup> <sup>≥</sup> 0 (resp. p > 0) means that for all <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>, <sup>p</sup>(x) <sup>≥</sup> 0 (resp. <sup>p</sup>(x) <sup>&</sup>gt; 0).

**Theorem 1.** *If there exist polynomials* <sup>r</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>[x] *such that*

$$-\sum\_{i} r\_i \, p\_i > 0 \quad \text{and} \quad \forall i, r\_i \ge 0 \tag{1}$$

*then the conjunction* <sup>i</sup> <sup>p</sup><sup>i</sup> <sup>≥</sup> <sup>0</sup> *is unsatisfiable*4*.*

*Proof.* Assume there exist <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> such that for all <sup>i</sup>, <sup>p</sup>i(x) <sup>≥</sup> 0. Then, since <sup>r</sup><sup>i</sup> <sup>≥</sup> 0, we have <sup>r</sup>i(x) <sup>p</sup>i(x) <sup>≥</sup> 0 hence ( <sup>i</sup> <sup>r</sup><sup>i</sup> <sup>p</sup>i) (x) <sup>≥</sup> 0 which contradicts − <sup>i</sup> r<sup>i</sup> p<sup>i</sup> > 0.

In fact, under some hypotheses<sup>5</sup> on the pi, the condition (1) is not only sufficient but also necessary, as stated by the Putinar's Positivstellensatz [27, Sect. 2.5.1]. Unfortunately, no practical bound is known on the degrees of the polynomials <sup>r</sup>i. In our prototype, we restrict the degrees of each <sup>r</sup><sup>i</sup> to<sup>6</sup> <sup>d</sup>−deg(pi) where d := maxi(deg(pi)), so that <sup>i</sup> r<sup>i</sup> p<sup>i</sup> is a polynomial of degree d. This is a first source of incompleteness, although benchmarks show that it already enables to solve many interesting problems.

The sum of squares (SOS) technique [26,36] is an efficient way to numerically solve polynomial problems such as (1). The next sections recall its main ideas.

#### **3.2 Sum of Squares (SOS) Polynomials**

A polynomial <sup>p</sup> <sup>∈</sup> <sup>R</sup>[x] is said to be SOS if there exist polynomials <sup>h</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>[x] such that for all x,

$$p(x) = \sum\_{i} h\_i^2(x).$$

Although not all non negative polynomials are SOS, being SOS is a sufficient condition to be non negative.


*Example 1 (from* [36]*).* Considering p(x1, x2)=2x<sup>4</sup> <sup>1</sup> + 2x<sup>3</sup> <sup>1</sup>x<sup>2</sup> <sup>−</sup>x<sup>2</sup> 1x<sup>2</sup> <sup>2</sup> + 5x<sup>4</sup> <sup>2</sup>, there exist h1(x1, x2) = <sup>√</sup> 1 2 2x<sup>2</sup> <sup>1</sup> <sup>−</sup> <sup>3</sup>x<sup>2</sup> <sup>2</sup> + x1x<sup>2</sup> and h2(x1, x2) = <sup>√</sup> 1 2 x2 <sup>2</sup> + 3x1x<sup>2</sup> such that p = h<sup>2</sup> <sup>1</sup> + h<sup>2</sup> <sup>2</sup>. This proves that for all <sup>x</sup>1, x<sup>2</sup> <sup>∈</sup> <sup>R</sup>, <sup>p</sup>(x1, x2) <sup>≥</sup> 0.

Any polynomial p of degree 2d (a non negative polynomial is necessarily of even degree) can be written as a quadratic form in the vector of all monomials of degree less or equal to d:

$$p(x) = z^T Q \, z \tag{2}$$

where z = 1, x1,...,xn, x1x2,...,x<sup>d</sup> n <sup>T</sup> and <sup>Q</sup> is a constant symmetric matrix. *Example 2.* For p(x1, x2)=2x<sup>4</sup> <sup>1</sup> + 2x<sup>3</sup> <sup>1</sup>x<sup>2</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> 1x<sup>2</sup> <sup>2</sup> + 5x<sup>4</sup> <sup>2</sup> , we have<sup>7</sup>

$$\begin{split} p(x\_1, x\_2) &= \begin{bmatrix} x\_1^2 \\ x\_2^2 \\ x\_1 x\_2 \end{bmatrix}^T \begin{bmatrix} q\_{11} \ q\_{12} \ q\_{13} \\ q\_{12} \ q\_{22} \ q\_{23} \\ q\_{13} \ q\_{23} \ q\_{33} \end{bmatrix} \begin{bmatrix} x\_1^2 \\ x\_2^2 \\ x\_1 x\_2 \end{bmatrix} \\ &= q\_{11} x\_1^4 + 2q\_{13} x\_1^3 x\_2 + (q\_{33} + 2q\_{12}) x\_1^2 x\_2^2 + 2q\_{23} x\_1 x\_2^3 + q\_{22} x\_2^4. \end{split}$$

Thus <sup>q</sup><sup>11</sup> = 2, 2q<sup>13</sup> = 2, <sup>q</sup><sup>33</sup> + 2q<sup>12</sup> <sup>=</sup> <sup>−</sup>1, 2q<sup>23</sup> = 0 and <sup>q</sup><sup>22</sup> = 5. Two possible examples for the matrix Q are shown below:

$$Q = \begin{bmatrix} 2 \ 1 & 1 \\ 1 \ 5 & 0 \\ 1 \ 0 & -3 \end{bmatrix}, \qquad Q' = \begin{bmatrix} 2 & -3 \ 1 \\ -3 & 5 & 0 \\ 1 & 0 & 5 \end{bmatrix}.$$

The polynomial p is then SOS if and only if there exists a positive semidefinite matrix Q satisfying (2). A matrix Q is called positive semi-definite, noted <sup>Q</sup> 0, if, for all vector <sup>x</sup>, <sup>x</sup><sup>T</sup>Q x <sup>≥</sup> 0. Just as a scalar <sup>q</sup> <sup>∈</sup> <sup>R</sup> is non negative if and only if <sup>q</sup> <sup>=</sup> <sup>r</sup><sup>2</sup> for some <sup>r</sup> <sup>∈</sup> <sup>R</sup> (typically <sup>r</sup> <sup>=</sup> <sup>√</sup>q), <sup>Q</sup> 0 if and only if <sup>Q</sup> <sup>=</sup> <sup>R</sup><sup>T</sup> <sup>R</sup> for some matrix <sup>R</sup> (then, for all <sup>x</sup>, <sup>x</sup><sup>T</sup>Qx = (Rx)<sup>T</sup> (Rx) = Rx<sup>2</sup> <sup>2</sup> ≥ 0). The vector Rz is then a vector of polynomials h<sup>i</sup> such that p = <sup>i</sup> h<sup>2</sup> i .

*Example 3.* In the previous example, the matrix Q is not positive semi-definite (for <sup>x</sup> = [0, <sup>0</sup>, 1]<sup>T</sup> , <sup>x</sup><sup>T</sup>Q x <sup>=</sup> <sup>−</sup>3). In contrast, <sup>Q</sup> 0 as <sup>Q</sup> <sup>=</sup> <sup>R</sup><sup>T</sup> <sup>R</sup> with

$$R = \frac{1}{\sqrt{2}} \begin{bmatrix} 2 \ -3 \ 1 \\ 0 \ 1 \ 3 \end{bmatrix}$$

giving the decomposition of Example 1.

#### **3.3 Semi-Definite Programming (SDP)**

Given symmetric matrices C, A1,...,A<sup>m</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup>×<sup>s</sup> and scalars <sup>a</sup>1,...,a<sup>m</sup> <sup>∈</sup> <sup>R</sup>, the following optimization problem is called *semi-definite programming*

$$\begin{array}{ll}\text{minimize} & \text{tr}(CQ) \\ \text{subject to} & \text{tr}(A\_1Q) = a\_1 \\ & \vdots \\ & \text{tr}(A\_mQ) = a\_m \\ & Q \succeq 0 \end{array} \tag{3}$$

<sup>7</sup> All monomials of p are of degree 4, so z does not need to contain 1, x<sup>1</sup> and x2.

where the symmetric matrix <sup>Q</sup> <sup>∈</sup> <sup>R</sup>s×<sup>s</sup> is the variable, tr(M) = <sup>i</sup> Mi,i denotes the trace of the matrix <sup>M</sup> and <sup>Q</sup> 0 means <sup>Q</sup> positive semi-definite.

*Remark 1.* Since the matrices are symmetric, tr(AQ) = tr(ATQ) = i,j Ai,jQi,j . The constraints tr(AQ) = a are then affine constraints between the entries of Q.

As we have just seen in Sect. 3.2, existence of a SOS decomposition amounts to existence of a positive semi-definite matrix satisfying a set of affine constraints, that is a solution of a semi-definite program. Semi-definite programming is a convex optimization problem for which there exist efficient numerical solvers [7, 44], thus enabling to solve problems involving polynomial inequalities over the reals.

#### **3.4 Parametric Problems**

Up to now, we have only seen how to check whether a given polynomial p with fixed coefficients is SOS (which implies its non negativeness). However, according to Sect. 3.1, we need to solve problems in which polynomials p have coefficients that are not fixed but parameters. One of the great strengths of SOS programming is its ability to solve such problems.

An unknown polynomial <sup>p</sup> <sup>∈</sup> <sup>R</sup>[x] of degree <sup>d</sup> with <sup>n</sup> variables can be written

$$p = \sum\_{\alpha\_1 + \dots + \alpha\_n \le d} p\_{\alpha} x\_1^{\alpha\_1} \dots x\_n^{\alpha\_n}$$

where the <sup>p</sup><sup>α</sup> are scalar parameters. A constraint such as <sup>r</sup><sup>i</sup> <sup>≥</sup> 0 in (1) can then be replaced by <sup>r</sup><sup>i</sup> is SOS, that is: <sup>∃</sup> <sup>Q</sup> <sup>0</sup>, r<sup>i</sup> <sup>=</sup> <sup>z</sup><sup>T</sup>Q z, which is a set of affine equalities between the coefficients of Q and the coefficients ri,α of ri. This can be cast as a semi-definite programming problem<sup>8</sup>.

Thus, problems with unknown polynomials p, as the one presented in Sect. 3.1, can be numerically solved through SOS programming.

*Remark 2 (Complexity).* The number s of monomials in n variables of degree less than or equal to d, i.e., the size of the vector z in the decomposition p(x) = z<sup>T</sup>Q z, is s := <sup>n</sup>+<sup>d</sup> d . This is polynomial in n for a fixed d (and vice versa). In practice, current SDP solvers can solve problems where s is about a few hundreds. This makes the SOS relaxation tractable for small values of <sup>n</sup> and <sup>d</sup> (<sup>n</sup> <sup>∼</sup> 10 and <sup>d</sup> <sup>∼</sup> 3, for instance). Our benchmarks indicate this is already enough to solve some practical problems that remain out of reach for other methods.

<sup>8</sup> By encoding the <sup>r</sup>i,α <sup>∈</sup> <sup>R</sup> as <sup>r</sup><sup>+</sup> i,α − r<sup>−</sup> i,α with r<sup>+</sup> i,α, r<sup>−</sup> i,α ≥ 0 and putting the new variables in a block diagonal matrix variable Q- := diag(Q, . . . , r<sup>+</sup> i,α, r<sup>−</sup> i,α,...).

### **4 Numerical Verification of SOS**

According to Sect. 3.1, a conjunction of polynomial constraints can be proved unsatisfiable by exhibiting other polynomials satisfying some constraints. Section 3.4 shows that such polynomials can be efficiently found by some numerical optimization solvers. Unfortunately, due to the algorithms they implement, we cannot directly trust the results of these solvers. This section details this issue and reviews two a-posteriori validation methods, with their respective weaknesses.

#### **4.1 Approximate Solutions from SDP Solvers**

In practice, the matrix Q returned by SDP solvers upon solving an SDP problem (3) does not precisely satisfy the equality constraints, due both to the algorithms used and their implementation with floating-point arithmetic. Therefore, although the SDP solver returns a positive answer for a SOS program, this does not constitute a valid proof that a given polynomial is SOS.

Most SDP solvers start from some <sup>Q</sup> 0 not satisfying the equality constraints (for instance the identity matrix) and iteratively modify it in order to reduce the distance between tr(AiQ) and a<sup>i</sup> while keeping Q positive semidefinite. This process is stopped when the distance is deemed small enough. This final distance is called the *primal infeasibility*, and is one of the result quality measures displayed by SDP solvers<sup>9</sup>. Therefore, we do not obtain a Q satisfying tr(AiQ) = a<sup>i</sup> but rather tr(AiQ) = a<sup>i</sup> + <sup>i</sup> for some small <sup>i</sup> such that <sup>|</sup><sup>i</sup>| ≤ -.

#### **4.2 Proving Existence of a Nearby Solution**

This primal infeasibility has a simple translation in terms of our original SOS problem. The polynomial equality p = z<sup>T</sup>Q z is encoded as one scalar constraint tr(AiQ) = a<sup>i</sup> for each coefficient a<sup>i</sup> of the polynomial p (c.f., Examples 2). coefficients of the polynomials p and z<sup>T</sup>Q z differ by some <sup>i</sup> and, since <sup>|</sup><sup>i</sup>| ≤ -, there exists a matrix <sup>E</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup>×<sup>s</sup> such that, for all i, j, <sup>|</sup>Ei,j | ≤ and

$$p = z^T (Q + E) \, z. \tag{4}$$

Proving that <sup>Q</sup> <sup>+</sup> <sup>E</sup> 0 is now enough to prove that the polynomial <sup>p</sup> is SOS, hence non negative. A sufficient condition is to check<sup>10</sup> <sup>Q</sup> <sup>−</sup> s-<sup>I</sup> 0.

As seen in Sect. 3.2, checking that a matrix M is positive semi-definite amounts to exhibiting a matrix R such that M = R<sup>T</sup> R. The Cholesky decomposition algorithm [45, Sect. 1.4] computes such a matrix R. Given a matrix <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup>×<sup>s</sup>, it attempts to compute <sup>R</sup> such that <sup>M</sup> <sup>=</sup> <sup>R</sup><sup>T</sup> <sup>R</sup> and when <sup>M</sup> is not positive semi-definite, it fails by attempting to take the square root of a negative value or perform a division by zero.

<sup>9</sup> Typically, -

<sup>∼</sup> <sup>10</sup>−<sup>8</sup>. <sup>10</sup> In order to get good likelihood for this to hold, we ask the SDP solver for <sup>Q</sup>−2s-I 0 rather than Q 0, as solvers often return matrices Q slightly not positive definite.

Due to rounding errors, a simple floating-point Cholesky decomposition would produce a matrix R not exactly satisfying the equality M = R<sup>T</sup> R, hence not proving <sup>M</sup> 0. However, these rounding errors can be bounded by a matrix <sup>B</sup> so that, when the floating-point Cholesky decomposition of <sup>M</sup> <sup>−</sup> <sup>B</sup> succeeds, then <sup>M</sup> 0 is guaranteed to hold. Moreover, <sup>B</sup> can be easily computed from the matrix M and the characteristics of the floating-point format used [41].

To sum up, the following verification procedure can prove that a given polynomial p is SOS<sup>11</sup>.

> Let <sup>Q</sup> <sup>∈</sup> <sup>R</sup><sup>s</sup>×<sup>s</sup> be the approximate solution returned by an SDP solver for the problem <sup>p</sup> <sup>=</sup> <sup>z</sup><sup>T</sup>Q z <sup>∧</sup> <sup>Q</sup> 0. Then,


**Complexity.** Note that step 1 can be achieved using floating-point interval arithmetic in Θ(s<sup>2</sup>) operations while the Cholesky decomposition in step 2 requires Θ(s<sup>3</sup>) floating-point operations. Thus, the whole verification method takes Θ(s<sup>3</sup>) floating-point operations which, in practice, constitutes a very small overhead compared to the time required by the SDP solver to compute Q.

**Soundness.** It is interesting to notice that the soundness of the method does not rely on the SDP solver. Thanks to this pessimistic method, the trusted codebase remains small, and efficient off-the-shelf solvers can be used as untrusted oracles. The method was even verified [31,38] within the Coq proof assistant.

**Incompleteness.** Numerical verification methods can only prove inequalities satisfied with some margin. Here, if the polynomial <sup>p</sup> to prove SOS (hence <sup>p</sup> <sup>≥</sup> 0) reaches the value 0, this usually means that the feasible set of the SDP problem Q <sup>p</sup> <sup>=</sup> <sup>z</sup><sup>T</sup>Q z,Q <sup>0</sup> has an empty relative interior (i.e., there is no point Q in this set such that a small ball centered on <sup>Q</sup> is included in {<sup>M</sup> <sup>|</sup> <sup>M</sup> <sup>0</sup>}) and the method does not work, as illustrated on Fig. 2. This is a second source of incompleteness of our approach, that adds to the limitation of degrees of polynomials searched for, as presented in Sect. 3.1.

*Remark 3.* The floating-point Cholesky decomposition is theoretically a third source of incompleteness. However, it is negligible as the entries of the bound matrix B are, in practice, orders of magnitude smallers than the accuracy of the SDP solvers [40].

<sup>11</sup> It is worth noting that the value reported by the solver for -, being just computed with floating-point arithmetic, cannot be formally trusted. It must then be recomputed.

**Fig. 2.** When the feasible set has an empty interior, the subspace M <sup>p</sup> <sup>=</sup> <sup>z</sup><sup>T</sup>M z is tangent to {M | M 0}. Thus the ball { Q + E } intersecting the subspace almost never lies in {M | M 0}, making the proof fail.

#### **4.3 Rounding to an Exact Rational Solution**

The most common solution to verify results of SOS programming is to round the output of the SDP solver to an exact rational solution [19,24,33].

To sum up, the matrix Q returned by the SDP solver is first projected to the subspace M <sup>p</sup> <sup>=</sup> <sup>z</sup><sup>T</sup>M z then all its entries are rounded to rationals with small denominators (first integers, then multiples of <sup>1</sup> 2 , 1 <sup>3</sup> ,...)<sup>12</sup>. For each rounding, positive semi-definiteness of the resulting matrix Q is tested using a complete check, based on a LDLT decomposition<sup>13</sup> [19]. The rationale behind this choice is that problems involving only simple rational coefficients can reasonably be expected to admit simple rational solutions<sup>14</sup>.

Using exact solutions potentially enables to verify SDP problems with empty relative interiors. This means the ability to prove inequalities without margin, to distinguish strict and non-strict inequalities and even to handle (dis)equalities. All of this nevertheless requires a different relaxation scheme than (1).

*Example 4.* To prove <sup>x</sup><sup>1</sup> <sup>≥</sup> <sup>0</sup> <sup>∧</sup> <sup>x</sup><sup>2</sup> <sup>≥</sup> <sup>0</sup> <sup>∧</sup> <sup>q</sup><sup>1</sup> = 0 <sup>∧</sup> <sup>q</sup><sup>2</sup> = 0 <sup>∧</sup> p > 0 unsatisfiable, with q<sup>1</sup> := x<sup>2</sup> <sup>1</sup> + x<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>3</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>4</sup> <sup>−</sup> 2, <sup>q</sup><sup>2</sup> := <sup>x</sup>1x<sup>3</sup> <sup>+</sup> <sup>x</sup>2x<sup>4</sup> and <sup>p</sup> := <sup>x</sup>3x<sup>4</sup> <sup>−</sup> <sup>x</sup>1x2, one can look for polynomials l1, l<sup>2</sup> and SOS polynomials s1,...,s<sup>8</sup> such that l1q<sup>1</sup> + l2q<sup>2</sup> + s<sup>1</sup> + s2p + s3x<sup>1</sup> + s4x1p + s5x<sup>2</sup> + s6x2p + s7x1x<sup>2</sup> + s8x1x2p + p = 0.

Rounding the result of an SDP solver yields <sup>l</sup><sup>1</sup> <sup>=</sup> <sup>−</sup><sup>1</sup> <sup>2</sup> (x1x<sup>2</sup> <sup>−</sup> <sup>x</sup>3x4), <sup>l</sup><sup>2</sup> <sup>=</sup> −1 <sup>2</sup> (x2x<sup>3</sup> <sup>+</sup> <sup>x</sup>1x4), <sup>s</sup><sup>2</sup> <sup>=</sup> <sup>1</sup> 2 x2 <sup>3</sup> + x<sup>2</sup> 4 , s<sup>7</sup> = <sup>1</sup> 2 x2 <sup>1</sup> + x<sup>2</sup> <sup>2</sup> + x<sup>2</sup> <sup>3</sup> + x<sup>2</sup> 4 and s<sup>1</sup> = s<sup>3</sup> = s<sup>4</sup> = s<sup>5</sup> = s<sup>6</sup> = s<sup>8</sup> = 0. This problem has no margin, since when replacing p > 0 by <sup>p</sup> <sup>≥</sup> 0, (x1, x2, x3, x4) = (0, <sup>√</sup>2, <sup>0</sup>, 0) becomes a solution.

Under some hypotheses, this relaxation scheme is complete, as stated by a theorem from Stengle [27, Theorem 2.11]. However, similarly to Sect. 3.1, no practical bound is known on the degrees of the relaxation polynomials.

<sup>12</sup> In practice, to ensure that the rounded matrix Q still satisfy the equality p = z<sup>T</sup> Q z, a dual SDP encoding is used, that differs from the encoding introduced in Sect. 3. This dual encoding is also called image representation [36, Sect. 6.1].

<sup>13</sup> The LDLT decomposition expresses a positive semi-definite matrix M as M = LDL<sup>T</sup> with L a lower triangular matrix and D a diagonal matrix.

<sup>14</sup> However, there exist rational SDP problems that do not admit any rational solution.

**Complexity.** The relaxation scheme involves products of all polynomials appearing in the original problem constraints. The number of such products, being exponential in the number of constraints, limits the scalability of the approach.

Moreover, to actually enjoy the benefits of exact solutions, the floating-point Cholesky decomposition introduced in Sect. 4.2 cannot be used and has to be replaced by an exact rational decomposition<sup>15</sup>. Computing decompositions of large matrices can then become particularly costly as the size of the involved rationals can blow up exponentially during the computation.

**Soundness.** The exact solutions make for an easy verification. The method is thus implemented in the HOL Light [19] and Coq [4] proof assistants.

**Incompleteness.** Although this verification method can work for some SDP problems with an empty relative interior, the rounding heuristic is not guaranteed to provide a solution. In practice, it tends to fail on large problems or problems whose coefficients are not rationals with small numerators and denominators.

### **5 Experimental Results**

#### **5.1 The OSDP Library**

The SOS to SDP translation described in Sect. 3, as well as the validation methods described in Sect. 4 have been implemented in our OCaml library OSDP. This library offers a common interface to the SDP solvers<sup>16</sup> Csdp [6], Mosek [2] and SDPA [46], giving simple access to SOS programming in contexts where soundness matters, such as SMT solvers or program static analyzers. It is composed of 5 kloc of OCaml and 1 kloc of C (interfaces with SDP solvers) and is available under LGPL license at https://cavale.enseeiht.fr/osdp/.

#### **5.2 Integration of OSDP in Alt-Ergo**

Alt-Ergo [5] is a very effective SMT solver for proving formulas generated by program verification frameworks. It is used as a back-end of different tools and in various settings, in particular via the Why3 [16] platform. For instance, the Frama-C [12] suite relies on it to prove formulas generated from C code, and the SPARK [21] toolset uses it to check formulas produced from Ada programs. It is also used by EasyCrypt [3] to prove formulas issued from cryptographic protocols verification, from the Cubicle [10] model-checker, and from Atelier-B [1].

<sup>15</sup> The Cholesky decomposition, involving square roots, cannot be computed in rational arithmetic, however its LDLT variant can.

<sup>16</sup> Csdp is used for the following benchmarks as it provides the best results.

Alt-Ergo's native input language is a polymorphic first-order logic *`a la ML* modulo theories, a very suitable language for expressing formulas generated in the context of program verification. Its reasoning engine is built on top of a SAT solver that interacts with a combination of decision procedures to look for a model for the input formula. Universally quantified formulas, that naturally arise in program verification, are handled via E-matching techniques. Currently, Alt-Ergo implements decision procedures for the free theory of equality with uninterpreted symbols, linear arithmetic over integers and rationals, fragments of non-linear arithmetic, enumerated and records datatypes, and the theory of associative and commutative function symbols (hereafter AC).

Figure 3 shows the simplified architecture of arithmetic reasoning framework in Alt-Ergo, and the OSDP extension. The first component in the figure is a completion-like algorithm AC(LA) that reasons modulo associativity and commutativity properties of non-linear multiplication, as well as its distributivity over addition<sup>17</sup>. AC(LA) is a modular extension of ground AC completion with a decision procedure for reasoning modulo equalities of linear integer and rational arithmetic [9]. It builds and maintains a convergent term-rewriting system modulo arithmetic equalities and the AC properties of the non-linear multiplication symbol. The rewriting system is used to update a union-find data-structure.

**Fig. 3.** Alt-Ergo's arithmetic reasoning framework with OSDP integration.

The second component is an Interval Calculus algorithm that computes bounds of (non-linear) terms: the initial non-linear problem is first relaxed by abstracting non-linear parts, and a Fourier-Motzkin extension<sup>18</sup> is used to infer bounds on the abstracted linear problem. In a second step, axioms of non-linear arithmetic are internally applied by intervals propagation. These two steps allow to maintain a map associating the terms of the problems (that are normalized *w.r.t.* the union-find) to unions of intervals.

Finally, the last part is the SAT solver that dispatches equalities and inequalities to the right component and performs case-split analysis over finite domains. Of course, this presentation is very simplified and the exact architecture of Alt-Ergo is much more complicated.

<sup>17</sup> Addition and multiplication by a constant is directly handled by the LA module.

<sup>18</sup> We can also use a simplex-based algorithm [8] for bounds inference.

$$\begin{array}{l} \varphi\_{i} := (p\_{i} - a\_{1})(b\_{1} - p\_{1}), \ldots, p\_{k} := (p\_{k} - a\_{k})(b\_{k} - p\_{k})\\ \varphi\_{i} := p\_{i} - a\_{i} \text{ when } b\_{i} = +\infty \text{ or } p\_{i}' := b\_{i} - p\_{i} \text{ when } a\_{i} = -\infty\\ d := \max\_{i} \left\{ \deg(p\_{i}') \right\} \\ \text{end} & -\sum\_{i=1}^{k} r\_{i} p\_{i}' \text{ is SOS, } r\_{i} \text{ is SOS, } \ldots, r\_{k}' \text{ is SOS} \\ \text{as an SDP problem} & -\sum\_{i} r\_{i} p\_{i}' = z\_{0}^{T} Q\_{0} z\_{0}, \; r\_{1} = z\_{1}^{T} Q\_{1} z\_{1}, \ldots, r\_{k} = z\_{k}^{T} Q\_{k} z\_{k} \\ \text{with } \deg(r\_{i}) := 2 \left\lceil \frac{d - \deg(p\_{i}')}{2} \right\rceil \\ \text{call an SDP solver and retrieve } r\_{1}, r\_{k} \text{ and } Q\_{0}, Q\_{1}, \ldots, Q\_{k} \\ \text{overproxmaxate } \epsilon\_{i} := \max\_{\alpha} \left\{ |c\_{\alpha}| \left| \begin{array}{l} r\_{i} - z\_{i}^{T} Q\_{1} z\_{i} = \sum\_{\alpha} c\_{\alpha} x^{\alpha} \\ \alpha \cdot \text{let } 1 \leq i \sim\_{0} Q\_{0} - \#[z\_{0} \, \! \! /$$

**Fig. 4.** Semi-decision procedure to prove <sup>k</sup> i=1 p<sup>i</sup> ∈ [ai, bi] unsat. #|z| is the size of the vector z and 0 is tested with a floating-point Cholesky decomposition [41].

The integration of OSDP in Alt-Ergo is achieved via the extension of the Interval Calculus component of the solver, as shown in Fig. 3: terms that are polynomials, and their corresponding interval bounds, form the problem (1) which is given to OSDP. OSDP attempts to verify its result with the method of Sect. 4.2. When it succeeds, the original conjunction of constraints is proved unsat. Otherwise, (dis)equalities are added and OSDP attempts a new proof by the method of Sect. 4.3. In case of success, unsat is proved, otherwise satisfiability or unsatisfiability cannot be deduced. Outlines of the first algorithm are given in Fig. 4 whereas the second one follows the original implementation [19].

Our modified version of Alt-Ergo is available under CeCILL-C license at https://cavale.enseeiht.fr/osdp/aesdp/.

**Incrementality.** In the SMT context, our theory solver is often succesively called with the same problem with a few additional constraints each time. It would then be interesting to avoid doing the whole computation again when a constraint is added, as is usually done with the simplex algorithm for linear arithmetic.

Some SDP solvers do offer to provide an initial point. Our experiments however indicated that this significantly speeds up the computation only when the provided point is extremely close to the solution. A bad initial point could even slow down the computation or, worse, make it fail. This is due to the very different nature of the interior point algorithms, compared to the simplex, and their convergence properties [7, Part III]. Thus, speed ups could only be obtained when the previous set of constraints was already unsatisfiable, ı.e. a useless case. **Small Conflict Sets.** When a set of constraints is unsatisfiable, some of them may not play any role in this unsatisfiability. Returning a small subset of unsatisfiable constraints can help the underlying SAT solver. Such useless constraints can easily be identified in (1) when the relaxation polynomial r<sup>i</sup> is 0. A common heuristic to maximize their number is to ask the SDP solver to minimize (the sum of) the traces of the matrices Qi.

When using the exact method of Sect. 4.3, the appropriate r<sup>i</sup> are exactly 0. Things are not so clear when using the approximate method of Sect. 4.2 since the r<sup>i</sup> are only *close to* 0. A simple solution is to rank the r<sup>i</sup> by decreasing trace of Q<sup>i</sup> before performing a dichotomy search for the smallest prefix of this sequence proved unsatisfiable. Thus, for n constraints, log(n) SDPs are solved.

#### **5.3 Experimental Results**

We compared our modified version of Alt-Ergo (v. 1.30) to the SMT solvers ran in both the QF NIA and QF NRA sections of the last SMT-COMP. We ran the solvers on two sets of benchmarks. The first set comes from the QF NIA and QF NRA benchmarks for the last SMT-COMP. The second set contains four subsets. The C problems are generated by Frama-C/Why3 [12,16] from control-command C programs such as the one from Sect. 2, with up to a dozen variables [11,39]. To distinguish difficulties coming from the handling of the memory model of C, for which Alt-Ergo was particularly designed, and from the actual non-linear arithmetic problem, the quadratic benchmarks contain simplified versions of the C problems with a purely arithmetic goal. To demonstrate that the interest of our approach is not limited to this initial target application, the flyspeck benchmarks come from the benchmark sets of dReal<sup>19</sup> [18] and global-opt are global optimization benchmarks [34]. All these benchmarks are available at https://cavale.enseeiht.fr/osdp/aesdp/. Since our solver only targets unsat proofs, benchmarks known sat were removed from both sets.

All experiments were conducted on an Intel Xeon 2.30 GHz processor, with individual runs limited to 2 GB of memory and 900 s. The results are presented in Tables 1, 2 and 3. For each subset of problems, the first column indicates the number of problems that each solver managed to prove unsat and the second presents the cumulative time (in seconds) for these problems. AE is the original Alt-Ergo, AESDP our new version, AESDPap the same but using only the approximate method of Sect. 4.2 and AESDPex using only the exact method of Sect. 4.3. All solvers were run with default options, except CVC4 which was run with all its --nl-ext\* options.

As seen in Tables 1 and 2, despite an improvement over Alt-Ergo alone, our development is not competitive with state-of-the-art solvers on the QF NIA and QF NRA benchmarks. In fact, the set of problems solved by any of our Alt-Ergo versions is strictly included in the set of problems solved by at least one of the other solvers. The most commonly observed source of failure for AESDPap here comes from SDPs with empty relative interior. Although AESDPex can handle such problems, it is impaired by its much higher complexity.

<sup>19</sup> Removing problems containing functions sin and cos, not handled by our tool.


**Table 1.** Experimental results on benchmarks from QF NIA.

However good results are obtained on the more numerical<sup>20</sup> second set of benchmarks. In particular, control-command programs with up to a dozen variables are verified while other solvers remain limited to two variables. Playing a key point in this result, the inequalities in these benchmarks are satisfied with some margin. For control command programs, this comes from the fact that they are designed to be robust to many small errors. This opens new perspectives for the verification of functional properties of control-command programs, particularly in the aerospace domain, our main application field at ONERA<sup>21</sup>.

Although solvers such as dReal, based on branch and bound with interval arithmetic could be expected to perform well on these numerical benchmarks, dReal solves less benchmarks than most other solvers. Geometrically speaking, the C benchmarks require to prove that an ellipsoid is included in a slightly larger one, i.e., the borders of both ellipsoids are close from one another. This requires to subdivide the space between the two borders in many small boxes so that none of them intersects both the interior of the first ellipsoid and the exterior of the second one. Whereas this can remain tractable for small dimensional ellipsoids, the number of required boxes grows exponentially with the dimension, which explains the poor results of dReal. This issue is unfortunately shared, to a large extent, by any linear relaxation, including more elaborate ones [30].

<sup>20</sup> Involving polynomials with a few dozen monomials or more and whose coefficients are not integers or rationals with small numerators and denominators.

<sup>21</sup> French public agency for aerospace research.


**Table 2.** Experimental results on benchmarks from QF NRA.

**Table 3.** Experimental results on benchmarks from [11,18,34,39].


## **6 Related Work and Conclusion**

*Related work.* Monniaux and Corbineau [33] improved the rounding heuristic of Harrison [19]. This has unfortunately no impact on the complexity of the relaxation scheme. Platzer et al. [37] compared their early versions with the symbolic methods based on quantifier elimination and Gr¨obner basis. An intermediate solution is offered by Magron et al. [29] but only handling a restricted class of parametric problems.

Branch-and-bound and interval arithmetic constitute another numerical approach to non-linear arithmetic, as implemented in the SMT solver dReal by Gao et al. [17,18]. These methods easily handle non-linear functions such as the trigonometric functions sin or cos, not yet considered in our prototype<sup>22</sup>. In the case of polynomial inequalities Munoz ˜ and Narkawicz [34] offer Bernstein polynomials as an improvement to simple interval arithmetic.

Finally, VSDP [20,22] is a wrapper to SDP solvers offering a similar method to the one of Sect. 4.2. Moreover, an implementation is also offered by Lofberg ¨ [28] in the popular Matlab interface Yalmip but remains unsound, since all computations are performed with floating-point arithmetic, ignoring rounding errors.

Using convex optimization into an SMT solver was already proposed by Nuzzo et al. [35,43]. However, they intentionally made their solver unsound in order to lean toward completeness. While this can make sense in a bounded model checking context, soundness is required for many applications, such as program verification. Moreover, this proposal was limited to convex formulas. Although this enables to provide models for satisfiable formulas, while only unsat formulas are considered in this paper, and whereas this seems a perfect choice for bounded model checking applications, non convex formulas are pervasive in applications such as program verification<sup>23</sup>.

The use of numerical off-the-shelf solvers in SMT tools has also been studied in the framework of linear arithmetic [15,32]. Some comparison with state-ofthe-art exact simplex procedures show mitigated results [14] but better results can be obtained by combining both approaches [25].

*Conclusion.* We presented a semi-decision procedure for non-linear polynomial constraints over the reals, based on numerical optimization solvers. Since these solvers only compute approximate solutions, a-posteriori soundness checks were investigated. Our first prototype implemented in the Alt-Ergo SMT solver shows that, although the new numerical method does not strictly outperform state-ofthe-art symbolic methods, it enables to solve practical problems that are out of reach for other methods. In particular, this is demonstrated on the verification of functional properties of control-command programs. Such properties are of significant importance for critical cyber-physical systems.

It could thus be worth studying the combination of symbolic and numerical methods in the hope to benefit from the best of both worlds.

<sup>22</sup> Polynomial approximations such as Taylor expansions should be investigated.

<sup>23</sup> Typically, to prove a convex loop invariant I for a loop body f, one need to prove I ⇒ I(f), that is ¬I ∨ I(f) which is likely non convex (¬I being concave).

**Data Availability Statement and Acknowledgements.** The source code, benchmarks and instructions to replicate the results of Sect. 5 are available in the figshare repository: http://doi.org/10.6084/m9.figshare.5900260.v1.

The authors thank R´emi Delmas for insightful discussions and technical help, particularly with the dReal solver.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Security and Reactive Systems

# **Approximate Reduction of Finite Automata for High-Speed Network Intrusion Detection**

Milan Ceˇ <sup>ˇ</sup> ska, Vojtˇech Havlena, Luk´aˇs Hol´ık, Ondˇrej Leng´al(B), and Tom´aˇs Vojnar

> FIT, IT4Innovations Centre of Excellence, Brno University of Technology, Brno, Czech Republic lengal@fit.vutbr.cz

**Abstract.** We consider the problem of *approximate reduction of non-deterministic automata* that appear in hardware-accelerated network intrusion detection systems (NIDSes). We define an error *distance* of a reduced automaton from the original one as the probability of packets being incorrectly classified by the reduced automaton (wrt the probabilistic distribution of packets in the network traffic). We use this notion to design an *approximate reduction procedure* that achieves a great size reduction (much beyond the state-of-the-art language preserving techniques) with a controlled and small error. We have implemented our approach and evaluated it on use cases from Snort, a popular NIDS. Our results provide experimental evidence that the method can be highly efficient in practice, allowing NIDSes to follow the rapid growth in the speed of networks.

### **1 Introduction**

The recent years have seen a boom in the number of security incidents in computer networks. In order to alleviate the impact of network attacks and intrusions, Internet providers want to detect malicious traffic at their network's entry points and on the backbones between sub-networks. Software-based network intrusion detection systems (NIDSes), such as the popular open-source system Snort [1], are capable of detecting suspicious network traffic by testing (among others) whether a packet payload matches a regular expression (regex) describing known patterns of malicious traffic. NIDSes collect and maintain vast databases of such regexes that are typically divided into groups according to types of the attacks and target protocols.

*Regex matching* is the most computationally demanding task of a NIDS as its cost grows with the speed of the network traffic as well as with the number and complexity of the regexes being matched. The current software-based NIDSes cannot perform the regex matching on networks beyond 1 Gbps [2,3], so they cannot handle the current speed of backbone networks ranging between tens and hundreds of Gbps. A promising approach to speed up NIDSes is to (partially) offload regex matching into hardware [3–5]. The hardware then serves as a prefilter of the network traffic, discarding the majority of the packets from further processing. Such pre-filtering can easily reduce the traffic the NIDS needs to handle by two or three orders of magnitude [3].

Field-programmable gate arrays (FPGAs) are the leading technology in highthroughput regex matching. Due to their inherent parallelism, FPGAs provide an efficient way of implementing *nondeterministic finite automata* (NFAs), which naturally arise from the input regexes. Although the amount of available resources in FPGAs is continually increasing, the speed of networks grows even faster. Working with multi-gigabit networks requires the hardware to use many parallel packet processing branches in a single FPGA [5]; each of them implementing a separate copy of the concerned NFA, and so reducing the size of the NFAs is of the utmost importance. Various language-preserving automata reduction approaches exist, mainly based on computing (bi)simulation relations on automata states (cf. the related work). The reductions they offer, however, do not satisfy the needs of high-speed hardware-accelerated NIDSes.

Our answer to the problem is *approximate reduction* of NFAs, allowing for a trade-off between the achieved reduction and the precision of the regex matching. To formalise the intuitive notion of precision, we propose a novel *probabilistic distance* of automata. It captures the probability that a packet of the input network traffic is incorrectly accepted or rejected by the approximated NFA. The distance assumes a *probabilistic model* of the network traffic (we show later how such a model can be obtained).

Having formalised the notion of precision, we specify the target of our reductions as two variants of an optimization problem: (1) minimizing the NFA size given the maximum allowed error (distance from the original), or (2) minimizing the error given the maximum allowed NFA size. Finding such optimal approximations is, however, computationally hard (**PSPACE**-complete, the same as precise NFA minimization).

Consequently, we sacrifice the optimality and, motivated by the typical structure of NFAs that emerge from a set of regexes used by NIDSes (a union of many long "tentacles" with occasional small strongly-connected components), we limit the space of possible reductions by restricting the set of operations they can apply to the original automaton. Namely, we consider two reduction operations: (i) collapsing the future of a state into a *self-loop* (this reduction over-approximates the language), or (ii) *removing states* (such a reduction is under-approximating).

The problem of identifying the optimal sets of states on which these operations should be applied is still **PSPACE**-complete. The restricted problem is, however, more amenable to an approximation by a *greedy algorithm*. The algorithm applies the reductions state-by-state in an order determined by a precomputed *error labelling* of the states. The process is stopped once the given optimization goal in terms of the size or error is reached. The labelling is based on the probability of packets that may be accepted through a given state and hence over-approximates the error that may be caused by applying the reduction at a given state. As our experiments show, this approach can give us high-quality reductions while ensuring formal error bounds.

Finally, it turns out that even the pre-computation of the error labelling of the states is costly (again **PSPACE**-complete). Therefore, we propose several ways to cheaply over-approximate it such that the strong error bound guarantees are still preserved. Particularly, we are able to exploit the typical structure of the "union of tentacles" of the hardware NFA in an algorithm that is exponential in the size of the largest "tentacle" only, which is indeed much faster in practice.

We have implemented our approach and evaluated it on regexes used to classify malicious traffic in Snort. We obtain quite encouraging experimental results demonstrating that our approach provides a much better reduction than language-preserving techniques with an almost negligible error. In particular, our experiments, going down to the level of an actual implementation of NFAs in FPGAs, confirm that we can squeeze into an up-to-date FPGA chip real-life regexes encoding malicious traffic, allowing them to be used with a negligible error for filtering at speeds of 100 Gbps (and even 400 Gbps). This is far beyond what one can achieve with current exact reduction approaches.

*Related Work.* Hardware acceleration for regex matching at the line rate is an intensively studied technology that uses general-purpose hardware [6–14] as well as FPGAs [3–5,15–20]. Most of the works focus on DFA implementation and optimization techniques. NFAs can be exponentially smaller than DFAs but need, in the worst case, <sup>O</sup>(n) memory accesses to process each byte of the payload where n is the number of states. In most cases, this incurs an unacceptable slowdown. Several works alleviate this disadvantage of NFAs by exploiting reconfigurability and fine-grained parallelism of FPGAs, allowing one to process one character per clock cycle (e.g. [3–5,15,16,19,20]).

In [14], which is probably the closest work to ours, the authors consider a set of regexes describing network attacks. They replace a potentially prohibitively large DFA by a tree of smaller DFAs, an alternative to using NFAs that minimizes the latency occurring in a non-FPGA-based implementation. The language of every DFA-node in the tree over-approximates the languages of its children. Packets are filtered through the tree from the root downwards until they belong to the language of the encountered nodes, and may be finally accepted at the leaves, or are rejected otherwise. The over-approximating DFAs are constructed using a similar notion of probability of an occurrence of a state as in our approach. The main differences from our work are that (1) the approach targets approximation of DFAs (not NFAs), (2) the over-approximation is based on a given traffic sample only (it cannot benefit from a probabilistic model), and (3) no probabilistic guarantees on the approximation error are provided.

Approximation of DFAs was considered in various other contexts. Hyper-minimization is an approach that is allowed to alter language membership of a finite set of words [21,22]. A DFA with a given maximum number of states is constructed in [23], minimizing the error defined either by (i) counting prefixes of misjudged words up to some length, or (ii) the sum of the probabilities of the misjudged words wrt the Poisson distribution over Σ∗. Neither of these approaches considers reduction of NFAs nor allows to control the expected error with respect to the real traffic.

In addition to the metrics mentioned above when discussing the works [21–23], the following metrics should also be mentioned. The Cesaro-Jaccard distance studied in [24] is, in spirit, similar to [23] and does also not reflect the probability of individual words. The edit distance of weighted automata from [25] depends on the minimum edit distance between pairs of words from the two compared languages, again regardless of their statistical significance. None of these notions is suitable for our needs.

Language-preserving minimization of NFAs is a **PSPACE**-complete problem [26,38]. More feasible (polynomial-time) is language-preserving size reduction of NFAs based on (bi)simulations [27–30], which does not aim for a truly minimal NFA. A number of advanced variants exist, based on multi-pebble or look-ahead simulations, or on combinations of forward and backward simulations [31–33]. The practical efficiency of these techniques is, however, often insufficient to allow them to handle the large NFAs that occur in practice and/or they do not manage to reduce the NFAs enough. Finally, even a minimal NFA for the given set of regexes is often too big to be implemented in the given FPGA operating on the required speed (as shown even in our experiments). Our approach is capable of a much better reduction for the price of a small change of the accepted language.

### **2 Preliminaries**

We use a, b to denote the set {<sup>x</sup> <sup>∈</sup> <sup>R</sup> <sup>|</sup> <sup>a</sup> <sup>≤</sup> <sup>x</sup> <sup>≤</sup> <sup>b</sup>} and <sup>N</sup> to denote the set {0, <sup>1</sup>, <sup>2</sup>,... }. Given a pair of sets <sup>X</sup><sup>1</sup> and <sup>X</sup>2, we use <sup>X</sup><sup>1</sup> <sup>X</sup><sup>2</sup> to denote their *symmetric difference*, i.e., the set {<sup>x</sup> | ∃!<sup>i</sup> ∈ {1, <sup>2</sup>} : <sup>x</sup> <sup>∈</sup> <sup>X</sup>i}. We use the notation [v1,...,vn] to denote a vector of n elements, **1** to denote the all 1's vector [1,..., 1], *A* to denote a matrix, and *A* for its transpose, and *I* for the identity matrix.

In the following, we fix a finite non-empty alphabet Σ. A *nondeterministic finite automaton* (NFA) is a quadruple A = (Q, δ, I, F) where Q is a finite set of states, <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> is a transition function, <sup>I</sup> <sup>⊆</sup> <sup>Q</sup> is a set of initial states, and <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is a set of accepting states. We use <sup>Q</sup>[A], δ[A], I[A], and <sup>F</sup>[A] to denote Q, δ, I, and F, respectively, and q <sup>a</sup> −→ <sup>q</sup> to denote that <sup>q</sup> <sup>∈</sup> <sup>δ</sup>(q, a). A sequence of states <sup>ρ</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> ··· <sup>q</sup><sup>n</sup> is a *run* of <sup>A</sup> over a word <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> from a state q to a state q , denoted as q w,ρ q , if <sup>∀</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> : <sup>q</sup><sup>i</sup>−<sup>1</sup> <sup>a</sup>*<sup>i</sup>* −→ <sup>q</sup>i, q<sup>0</sup> = q, and q<sup>n</sup> = q . Sometimes, we use ρ in set operations where it behaves as the set of states it contains. We also use q <sup>w</sup> <sup>q</sup> to denote that <sup>∃</sup><sup>ρ</sup> <sup>∈</sup> <sup>Q</sup><sup>∗</sup> : <sup>q</sup> w,ρ q and q <sup>q</sup> to denote that <sup>∃</sup><sup>w</sup> : <sup>q</sup> <sup>w</sup> q . The *language* of a state q is defined as <sup>L</sup>A(q) = {<sup>w</sup> | ∃q<sup>F</sup> <sup>∈</sup> <sup>F</sup> : <sup>q</sup> <sup>w</sup> <sup>q</sup><sup>F</sup> } and its *banguage* (back-language) is defined as L <sup>A</sup>(q) = {<sup>w</sup> | ∃q<sup>I</sup> <sup>∈</sup> <sup>I</sup> : <sup>q</sup><sup>I</sup> w <sup>q</sup>}. Both notions can be naturally extended to a set <sup>S</sup> <sup>⊆</sup> <sup>Q</sup>: <sup>L</sup>A(S) = - <sup>q</sup>∈<sup>S</sup> <sup>L</sup>A(q) and <sup>L</sup> <sup>A</sup>(S) = - <sup>q</sup>∈<sup>S</sup> <sup>L</sup> <sup>A</sup>(q). We drop the subscript A when the context is obvious. A *accepts* the language L(A) defined as <sup>L</sup>(A) = <sup>L</sup>A(I). <sup>A</sup> is called *deterministic* (DFA) if <sup>|</sup>I<sup>|</sup> = 1 and <sup>∀</sup><sup>q</sup> <sup>∈</sup> <sup>Q</sup> and <sup>∀</sup><sup>a</sup> <sup>∈</sup> <sup>Σ</sup> : <sup>|</sup>δ(q, a)| ≤ 1, and *unambiguous* (UFA) if <sup>∀</sup><sup>w</sup> <sup>∈</sup> <sup>L</sup>(A) : <sup>∃</sup>!q<sup>I</sup> <sup>∈</sup> I,ρ <sup>∈</sup> <sup>Q</sup>∗, q<sup>F</sup> <sup>∈</sup> <sup>F</sup> : <sup>q</sup><sup>I</sup> w,ρ q<sup>F</sup> .

The *restriction* of <sup>A</sup> to <sup>S</sup> <sup>⊆</sup> <sup>Q</sup> is an NFA <sup>A</sup>|<sup>S</sup> given as <sup>A</sup>|<sup>S</sup> = (S, δ <sup>∩</sup> (<sup>S</sup> <sup>×</sup> <sup>Σ</sup> <sup>×</sup> <sup>2</sup>S), I <sup>∩</sup> S, F <sup>∩</sup> <sup>S</sup>). We define the *trim* operation as *trim*(A) = <sup>A</sup>|<sup>C</sup> where <sup>C</sup> <sup>=</sup> {<sup>q</sup> | ∃q<sup>I</sup> <sup>∈</sup> I,q<sup>F</sup> <sup>∈</sup> <sup>F</sup> : <sup>q</sup><sup>I</sup> q <sup>q</sup><sup>F</sup> }. For a set of states <sup>R</sup> <sup>⊆</sup> <sup>Q</sup>, we use *reach*(R) to denote the set of states reachable from <sup>R</sup>, formally, *reach*(R) = {r <sup>|</sup> <sup>∃</sup><sup>r</sup> <sup>∈</sup> <sup>R</sup> : <sup>r</sup> r }. We use the number of states as the measurement of the size of <sup>A</sup>, i.e., <sup>|</sup>A<sup>|</sup> <sup>=</sup> <sup>|</sup>Q|.

A (discrete probability) *distribution* over a set <sup>X</sup> is a mapping Pr : <sup>X</sup> → -<sup>0</sup>, <sup>1</sup> such that <sup>x</sup>∈<sup>X</sup> Pr(x) = 1. An <sup>n</sup>-state *probabilistic automaton* (PA) over <sup>Σ</sup> is a triple <sup>P</sup> = (*α*, *<sup>γ</sup>*, {*Δ*a}<sup>a</sup>∈<sup>Σ</sup>) where *<sup>α</sup>* ∈ -<sup>0</sup>, <sup>1</sup><sup>n</sup> is a vector of *initial weights*, *γ* ∈ -<sup>0</sup>, <sup>1</sup><sup>n</sup> is a vector of *final weights*, and for every <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, *<sup>Δ</sup>*<sup>a</sup> ∈ -<sup>0</sup>, <sup>1</sup><sup>n</sup>×<sup>n</sup> is a *transition matrix* for symbol a. We abuse notation and use Q[P] to denote the set of states <sup>Q</sup>[P] = {1,...,n}. Moreover, the following two properties need to hold: (i) {*α*[i] <sup>|</sup> <sup>i</sup> <sup>∈</sup> <sup>Q</sup>[P]} = 1 (the initial probability is 1) and (ii) for every state <sup>i</sup> <sup>∈</sup> <sup>Q</sup>[P] it holds that {*Δ*a[i, j] <sup>|</sup> <sup>j</sup> <sup>∈</sup> <sup>Q</sup>[P], a <sup>∈</sup> <sup>Σ</sup>} <sup>+</sup> *<sup>γ</sup>*[i] = 1 (the probability of accepting or leaving a state is 1). We define the *support* of P as the NFA *supp*(P)=(Q[P], δ[P], I[P], F[P]) s.t.

$$\delta[P] = \{(i, a, j) \mid \Delta\_a[i, j] > 0\} \quad I[P] = \{i \mid \alpha[i] > 0\} \quad F[P] = \{i \mid \gamma[i] > 0\}.$$

Let us assume that every PA P is such that *supp*(P) = *trim*(*supp*(P)). For a word <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ...a<sup>k</sup> <sup>∈</sup> <sup>Σ</sup>∗, we use *<sup>Δ</sup>*<sup>w</sup> to denote the matrix *<sup>Δ</sup>*<sup>a</sup><sup>1</sup> ··· *<sup>Δ</sup>*<sup>a</sup>*<sup>k</sup>* . It can be easily shown that <sup>P</sup> represents a distribution over words <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> defined as Pr<sup>P</sup> (w) = *<sup>α</sup>*·*Δ*<sup>w</sup> ·*γ*. We call Pr<sup>P</sup> (w) the *probability* of <sup>w</sup> in <sup>P</sup>. Given a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>∗, we define the probability of <sup>L</sup> in <sup>P</sup> as Pr<sup>P</sup> (L) = <sup>w</sup>∈<sup>L</sup> Pr<sup>P</sup> (w).

If Conditions (i) and (ii) from the definition of PAs are dropped, we speak about a *pseudo-probabilistic automaton (PPA)*, which may assign a word from its support a quantity that is not necessarily in the range -<sup>0</sup>, <sup>1</sup>, denoted as the *significance* of the word below. PPAs may arise during some of our operations performed on PAs.

### **3 Approximate Reduction of NFAs**

In this section, we first introduce the key notion of our approach: a *probabilistic distance* of a pair of finite automata wrt a given probabilistic automaton that, intuitively, represents the significance of particular words. We discuss the complexity of computing the probabilistic distance. Finally, we formulate two problems of *approximate automata reduction via probabilistic distance*. Proofs of the lemmas can be found in [43].

#### **3.1 Probabilistic Distance**

We start by defining our notion of a probabilistic distance of two NFAs. Assume NFAs A<sup>1</sup> and A<sup>2</sup> and a probabilistic automaton P specifying the distribution Pr<sup>P</sup> : <sup>Σ</sup><sup>∗</sup> → -<sup>0</sup>, <sup>1</sup>. The *probabilistic distance* <sup>d</sup><sup>P</sup> (A1, A2) between <sup>A</sup><sup>1</sup> and <sup>A</sup><sup>2</sup> wrt Pr<sup>P</sup> is defined as

$$d\_P(A\_1, A\_2) = \Pr\_P(L(A\_1) \triangle L(A\_2)).$$

Intuitively, the distance captures the significance of the words accepted by one of the automata only. We use the distance to drive the reduction process towards automata with small errors and to assess the quality of the resulting automata.

The value of Pr<sup>P</sup> (L(A1) <sup>L</sup>(A2)) can be computed as follows. Using the fact that (1) <sup>L</sup><sup>1</sup> <sup>L</sup><sup>2</sup> = (L<sup>1</sup> \ <sup>L</sup>2)  (L<sup>2</sup> \ <sup>L</sup>1) and (2) <sup>L</sup><sup>1</sup> \ <sup>L</sup><sup>2</sup> <sup>=</sup> <sup>L</sup><sup>1</sup> \ (L<sup>1</sup> <sup>∩</sup> <sup>L</sup>2), we get

$$\begin{aligned} d\_P(A\_1, A\_2) &= \Pr\_P(L(A\_1) \mid L(A\_2)) + \Pr\_P(L(A\_2) \mid L(A\_1)) \\ &= \Pr\_P(L(A\_1) \mid (L(A\_1) \cap L(A\_2))) + \Pr\_P(L(A\_2) \mid (L(A\_2) \cap L(A\_1))) \\ &= \Pr\_P(L(A\_1)) + \Pr\_P(L(A\_2)) - 2 \cdot \Pr\_P(L(A\_1) \cap L(A\_2)). \end{aligned}$$

Hence, the key step is to compute Pr<sup>P</sup> (L(A)) for an NFA A and a PA P. Problems similar to computing such a probability have been extensively studied in several contexts including verification of probabilistic systems [34–36]. The below lemma summarises the complexity of this step.

### **Lemma 1.** *Let* P *be a PA and* A *an NFA. The problem of computing* Pr<sup>P</sup> (L(A)) *is PSPACE-complete. For a UFA* A*,* Pr<sup>P</sup> (L(A)) *can be computed in PTIME.*

In our approach, we apply the method of [36] and compute Pr<sup>P</sup> (L(A)) in the following way. We first check whether the NFA A is unambiguous. This can be done by using the standard product construction (denoted as ∩) for computing the intersection of the NFA A with itself and trimming the result, formally <sup>B</sup> <sup>=</sup> *trim*(<sup>A</sup> <sup>∩</sup> <sup>A</sup>), followed by a check whether there is some state (p, q) <sup>∈</sup> <sup>Q</sup>[B] s.t. <sup>p</sup> <sup>=</sup> <sup>q</sup> [37]. If <sup>A</sup> is ambiguous, we either determinise it or disambiguate it [37], leading to a DFA/UFA A , respectively.<sup>1</sup> Then, we construct the trimmed product of <sup>A</sup> and <sup>P</sup> (this can be seen as computing <sup>A</sup> <sup>∩</sup> *supp*(P) while keeping the probabilities from P on the edges of the result), yielding a PPA <sup>R</sup> = (*α*, *<sup>γ</sup>*, {*Δ*a}<sup>a</sup>∈<sup>Σ</sup>).<sup>2</sup> Intuitively, <sup>R</sup> represents not only the words of <sup>L</sup>(A) but also their probability in P. Now, let *Δ* = <sup>a</sup>∈<sup>Σ</sup> *<sup>Δ</sup>*<sup>a</sup> be the matrix that expresses, for any p, q <sup>∈</sup> <sup>Q</sup>[R], the significance of getting from <sup>p</sup> to <sup>q</sup> via any <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>. Further, it can be shown (cf. the proof of Lemma <sup>1</sup> in [43]) that the matrix *<sup>Δ</sup>*<sup>∗</sup>, representing the significance of going from <sup>p</sup> to <sup>q</sup> via any <sup>w</sup> <sup>∈</sup> <sup>Σ</sup>∗, can be computed as (*<sup>I</sup>* <sup>−</sup> *<sup>Δ</sup>*)−<sup>1</sup>. Then, to get Pr<sup>P</sup> (L(A)), it suffices to take *<sup>α</sup>* · *<sup>Δ</sup>*<sup>∗</sup> · *<sup>γ</sup>*. Note that, due to the determinisation/disambiguation step, the obtained value indeed is Pr<sup>P</sup> (L(A)) despite R being a PPA.

<sup>1</sup> In theory, disambiguation can produce smaller automata, but, in our experiments, determinisation proved to work better.

<sup>2</sup> R is not necessarily a PA since there might be transitions in P that are either removed or copied several times in the product construction.

#### **3.2 Automata Reduction Using Probabilistic Distance**

We now exploit the above introduced probabilistic distance to formulate the task of approximate reduction of NFAs as the following two optimisation problems. Given an NFA <sup>A</sup> and a PA <sup>P</sup> specifying the distribution Pr<sup>P</sup> : <sup>Σ</sup><sup>∗</sup> → -<sup>0</sup>, <sup>1</sup>, we define


The following lemma shows that the natural decision problem underlying both of the above optimization problems is **PSPACE**-complete, which matches the complexity of computing the probabilistic distance as well as that of the *exact* reduction of NFAs [38].

**Lemma 2.** *Consider an NFA* A*, a PA* P*, a bound on the number of states* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, and an error bound* ∈ -<sup>0</sup>, <sup>1</sup>*. It is PSPACE-complete to determine whether there exists an NFA* A *with* n *states s.t.* d<sup>P</sup> (A, A ) <sup>≤</sup> *.*

The notions defined above do not distinguish between introducing a *false positive* (A accepts a word w /<sup>∈</sup> <sup>L</sup>(A)) or a *false negative* (A does not accept a word <sup>w</sup> <sup>∈</sup> <sup>L</sup>(A)) answers. To this end, we define *over-approximating* and *under-approximating* reductions as reductions for which the additional conditions <sup>L</sup>(A) <sup>⊆</sup> <sup>L</sup>(A ) and <sup>L</sup>(A) <sup>⊇</sup> <sup>L</sup>(A ) hold, respectively.

A na¨ıve solution to the reductions would enumerate all NFAs A of sizes from 0 up to <sup>k</sup> (resp. <sup>|</sup>A|), for each of them compute <sup>d</sup><sup>P</sup> (A, A ), and take an automaton with the smallest probabilistic distance (resp. a smallest one satisfying the restriction on d<sup>P</sup> (A, A )). Obviously, this approach is computationally infeasible.

### **4 A Heuristic Approach to Approximate Reduction**

In this section, we introduce two techniques for approximate reduction of NFAs that avoid the need to iterate over all automata of a certain size. The first approach under-approximates the automata by removing states—we call it the *pruning reduction*—while the second approach over-approximates the automata by adding self-loops to states and removing redundant states—we call it the *self-loop reduction*. Finding an optimal automaton using these reductions is also **PSPACE**-complete, but more amenable to heuristics like greedy algorithms. We start with introducing two high-level greedy algorithms, one for the size- and one for the error-driven reduction, and follow by showing their instantiations for the pruning and the self-loop reduction. A crucial role in the algorithms is played by a function that labels states of the automata by an estimate of the error that will be caused when some of the reductions is applied at a given state.

#### **4.1 A General Algorithm for Size-Driven Reduction**

Algorithm 1 shows a general greedy method for performing the size-driven reduction. In order to use the same high-level algorithm in both directions of reduction (over/under-approximating), it is parameterized with three functions: *label*, *reduce*, and *error* . The real intricacy of the procedure is hidden inside


these three functions. Intuitively, *label*(A, P) assigns every state of an NFA A an approximation of the error that will be caused wrt the PA P when a reduction is applied at this state, while the purpose of *reduce*(A, V ) is to create a new NFA A obtained from A by introducing some error at states from V . <sup>3</sup> Further, *error* (A, V, *label*(A, P)) estimates the error introduced by the application of *reduce*(A, V ), possibly in a more precise (and costly) way than by just summing the concerned error labels: Such a computation is possible outside of the main computation loop. We show instantiations of these functions later, when discussing the reductions used. Moreover, the algorithm is also parameterized with a total order A,*label*(A,P ) that defines which states of <sup>A</sup> are processed first and which are processed later. The ordering may take into account the precomputed labelling. The algorithm accepts an NFA <sup>A</sup>, a PA <sup>P</sup>, and <sup>n</sup> <sup>∈</sup> <sup>N</sup> and outputs a pair consisting of an NFA <sup>A</sup> of the size <sup>|</sup>A | ≤ <sup>n</sup> and an error bound such that d<sup>P</sup> (A, A ) <sup>≤</sup> .

The main idea of the algorithm is that it creates a set V of states where an error is to be introduced. V is constructed by starting from an empty set and adding states to it in the order given by A,*label*(A,P ), until the size of the result of *reduce*(A, V ) has reached the desired bound n (in our setting, *reduce* is always antitone, i.e., for <sup>V</sup> <sup>⊆</sup> <sup>V</sup> , it holds that <sup>|</sup>*reduce*(A, V )|≥|*reduce*(A, V )|). We now define the necessary condition for *label*, *reduce*, and *error* that makes Algorithm 1 correct.

**Condition C1** *holds if for every NFA* <sup>A</sup>*, PA* <sup>P</sup>*, and a set* <sup>V</sup> <sup>⊆</sup> <sup>Q</sup>[A]*, we have that (a) error* (A, V, *label*(A, P)) <sup>≥</sup> <sup>d</sup><sup>P</sup> (A, *reduce*(A, V ))*, (b)* <sup>|</sup>*reduce*(A, Q[A])| ≤ <sup>1</sup>*, and (c) reduce*(A, <sup>∅</sup>) = <sup>A</sup>*.*

**C1**(a) ensures that the error computed by the reduction algorithm indeed over-approximates the exact probabilistic distance, **C1**(b) ensures that the algorithm can (in the worst case, by applying the reduction at every state of A) for any <sup>n</sup> <sup>≥</sup> 1 output a result <sup>|</sup>A <sup>|</sup> of the size <sup>|</sup>A | ≤ <sup>n</sup>, and **C1**(c) ensures that when no error is to be introduced at any state, we obtain the original automaton.

**Lemma 3.** *Algorithm 1 is correct if C1 holds.*

<sup>3</sup> We emphasize that this does not mean that states from V will be simply removed from A—the performed operation depends on the particular reduction.

#### **4.2 A General Algorithm for Error-Driven Reduction**

In Algorithm 2, we provide a high-level method of computing the error-driven reduction. The algorithm is in many ways similar to Algorithm 1; It also computes a set of states V where an error is to be introduced, but an important difference is that we compute an approximation

**Algorithm 2.** A greedy error-driven reduction. **Input** : NFA <sup>A</sup> = (Q, δ, I, F), PA <sup>P</sup>, ∈ -<sup>0</sup>, <sup>1</sup> **Output**: NFA A s.t. d<sup>P</sup> (A, A ) <sup>≤</sup> **<sup>1</sup>** <sup>←</sup> *label*(A, P); **<sup>2</sup>** <sup>V</sup> ← ∅; **<sup>3</sup> for** <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *in the order* A,*label*(A,P ) **do <sup>4</sup>** <sup>e</sup> <sup>←</sup> *error* (A, V ∪ {q}, ); **<sup>5</sup> if** <sup>e</sup> <sup>≤</sup> **then** <sup>V</sup> <sup>←</sup> <sup>V</sup> ∪ {q} ; **<sup>6</sup> return** A = *reduce*(A, V );

of the error in each step and only add q to V if it does not raise the error over the threshold . Note that the *error* does not need to be monotone, so it may be advantageous to traverse all states from Q and not terminate as soon as the threshold is reached. The correctness of Algorithm 2 also depends on **C1**.

**Lemma 4.** *Algorithm 2 is correct if C1 holds.*

#### **4.3 Pruning Reduction**

The pruning reduction is based on identifying a set of states to be removed from an NFA A, under-approximating the language of A. In particular, for A = (Q, δ, I, F), the pruning reduction finds a set <sup>R</sup> <sup>⊆</sup> <sup>Q</sup> and restricts <sup>A</sup> to <sup>Q</sup> \ <sup>R</sup>, followed by removing useless states, to construct a reduced automaton A = *trim*(A|Q\<sup>R</sup>). Note that the natural decision problem corresponding to this reduction is also **PSPACE**-complete.

**Lemma 5.** *Consider an NFA* A*, a PA* P*, a bound on the number of states* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, and an error bound* ∈ -<sup>0</sup>, <sup>1</sup>*. It is PSPACE-complete to determine whether there exists a subset of states* <sup>R</sup> <sup>⊆</sup> <sup>Q</sup>[A] *of the size* <sup>|</sup>R<sup>|</sup> <sup>=</sup> <sup>n</sup> *such that* <sup>d</sup><sup>P</sup> (A, A|<sup>R</sup>) <sup>≤</sup> *.*

Although Lemma 5 shows that the pruning reduction is as hard as a general reduction (cf. Lemma 2), the pruning reduction is more amenable to the use of heuristics like the greedy algorithms from Sects. 4.1 and 4.2. We instantiate *reduce*, *error* , and *label* in these high-level algorithms in the following way (the subscript p means *pruning*):

$$reduce\_p(A, V) = \operatorname{trmin}(A\_{\mid Q\backslash V}), \quad \operatorname{error}\_p(A, V, \ell) = \min\_{V' \in \{V \mid \_p}} \sum \left\{ \ell(q) \mid q \in V' \right\},$$

where <sup>V</sup> <sup>p</sup> is defined as follows. Because of the use of *trim* in *reduce*p, for a pair of sets V,V s.t. <sup>V</sup> <sup>⊂</sup> <sup>V</sup> , it holds that *reduce*p(A, V ) may, in general, yield the same automaton as *reduce*p(A, V ). Hence, we define a partial order <sup>p</sup> on 2<sup>Q</sup> as <sup>V</sup><sup>1</sup> <sup>p</sup> <sup>V</sup><sup>2</sup> iff *reduce*p(A, V1) = *reduce*p(A, V2) and <sup>V</sup><sup>1</sup> <sup>⊆</sup> <sup>V</sup>2, and use <sup>V</sup> <sup>p</sup> to denote the set of minimal elements wrt <sup>V</sup> and p. The value of the approximation *error* <sup>p</sup>(A, V, ) is therefore the minimum of the sum of errors over all sets from <sup>V</sup> p.

Note that the size of <sup>V</sup> <sup>p</sup> can again be exponential, and thus we employ a greedy approach for guessing an optimal V . Clearly, this cannot affect the soundness of the algorithm, but only decreases the precision of the bound on the distance. Our experiments indicate that for automata appearing in NIDSes, this simplification has typically only a negligible impact on the precision of the bounds.

For computing the state labelling, we provide the following three functions, which differ in the precision they provide and the difficulty of their computation (naturally, more precise labellings are harder to compute): *label* <sup>1</sup> <sup>p</sup>, *label* <sup>2</sup> <sup>p</sup>, and *label* <sup>3</sup> <sup>p</sup>. Given an NFA A and a PA P, they generate the labellings <sup>1</sup> p, <sup>2</sup> <sup>p</sup>, and <sup>3</sup> p, respectively, defined as

$$\begin{aligned} \ell\_p^1(q) &= \sum \left\{ \operatorname{Pr}\_P(L\_A^b(q')) \; \middle| \; q' \in \operatorname{reach}(\{q\}) \cap F \right\}, \\\\ \ell\_p^2(q) &= \operatorname{Pr}\_P\left(L\_A^b(F \cap \operatorname{reach}(q))\right), \qquad \ell\_p^3(q) = \operatorname{Pr}\_P\left(L\_A^b(q).L\_A(q)\right). \end{aligned}$$

A state label (q) approximates the error of the words removed from L(A) when q is removed. More concretely, <sup>1</sup> <sup>p</sup>(q) is a rough estimate saying that the error can be bounded by the sum of probabilities of the banguages of all final states reachable from q (in the worst case, all those final states might become unreachable). Note that <sup>1</sup> <sup>p</sup>(q) (1) counts the error of a word accepted in two different final states of *reach*(q) twice, and (2) also considers words that are accepted in some final state in *reach*(q) without going through q. The labelling <sup>2</sup> p deals with (1) by computing the total probability of the banguage of the set of all final states reachable from q, and the labelling <sup>3</sup> <sup>p</sup> in addition also deals with (2) by only considering words that traverse through q (they can still be accepted in some final state not in *reach*(q) though, so even <sup>3</sup> <sup>p</sup> is still imprecise). Note that if A is unambiguous then <sup>1</sup> <sup>p</sup> <sup>=</sup> <sup>2</sup> p.

When computing the label of q, we first modify A to obtain A accepting the language related to the particular labelling. Then, we compute the value of Pr<sup>P</sup> (L(A )) using the algorithm from Sect. 3.1. Recall that this step is in general costly, due to the determinisation/disambiguation of A . The key property of the labelling computation resides in the fact that if A is composed of several disjoint sub-automata, the automaton A is typically much smaller than A and thus the computation of the label is considerable less demanding. Since the automata appearing in regex matching for NIDS are composed of the union of "tentacles", the particular A s are very small, which enables efficient componentwise computation of the labels.

The following lemma states the correctness of using the pruning reduction as an instantiation of Algorithms 1 and 2 and also the relation among <sup>1</sup> <sup>p</sup>, <sup>2</sup> p, and <sup>3</sup> p.

**Lemma 6.** *For every* <sup>x</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>}*, the functions reduce*p*, error* <sup>p</sup>*, and label*<sup>x</sup> p *satisfy C1. Moreover, consider an NFA* A*, a PA* P*, and let* <sup>x</sup> <sup>p</sup> = *label*<sup>x</sup> <sup>p</sup>(A, P) *for* <sup>x</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>}*. Then, for each* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>[A]*, we have* <sup>1</sup> <sup>p</sup>(q) <sup>≥</sup> <sup>2</sup> <sup>p</sup>(q) <sup>≥</sup> <sup>3</sup> <sup>p</sup>(q)*.*

#### **4.4 Self-loop Reduction**

The main idea of the self-loop reduction is to over-approximate the language of A by adding self-loops over every symbol at selected states. This makes some states of A redundant, allowing them to be removed without introducing any more error. Given an NFA A = (Q, δ, I, F), the self-loop reduction searches for a set of states <sup>R</sup> <sup>⊆</sup> <sup>Q</sup>, which will have self-loops added, and removes other transitions leading out of these states, making some states unreachable. The unreachable states are then removed.

Formally, let *sl*(A, R) be the NFA (Q, δ ,I,F) whose transition function δ is defined, for all <sup>p</sup> <sup>∈</sup> <sup>Q</sup> and <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, as <sup>δ</sup> (p, a) = {p} if <sup>p</sup> <sup>∈</sup> <sup>R</sup> and <sup>δ</sup> (p, a) = δ(p, a) otherwise. As with the pruning reduction, the natural decision problem corresponding to the self-loop reduction is also **PSPACE**-complete.

**Lemma 7.** *Consider an NFA* A*, a PA* P*, a bound on the number of states* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, and an error bound* ∈ -<sup>0</sup>, <sup>1</sup>*. It is PSPACE-complete to determine whether there exists a subset of states* <sup>R</sup> <sup>⊆</sup> <sup>Q</sup>[A] *of the size* <sup>|</sup>R<sup>|</sup> <sup>=</sup> <sup>n</sup> *such that* <sup>d</sup><sup>P</sup> (A, *sl*(A, R)) <sup>≤</sup> *.*

The required functions in the error- and size-driven reduction algorithms are instantiated in the following way (the subcript *sl* means *self-loop*):

$$reduce\_{sl}(A, V) = \operatorname{term}(sl(A, V)), \quad error\_{sl}(A, V, \ell) = \sum \left\{ \ell(q) \mid q \in \min\left( \lfloor V \rfloor\_{sl} \right) \right\},$$

where <sup>V</sup> *sl* is defined in a similar manner as <sup>V</sup> <sup>p</sup> in the previous section (using a partial order *sl* defined similarly to p; in this case, the order *sl* has a single minimal element, though).

The functions *label* <sup>1</sup> *sl* , *label* <sup>2</sup> *sl* , and *label* <sup>3</sup> *sl* compute the state labellings <sup>1</sup> *sl* , <sup>2</sup> *sl* , and <sup>3</sup> *sl* for an NFA A and a PA P defined as follows:

$$\begin{aligned} \ell\_{sl}^1(q) &= weight\_P(L\_A^\flat(q)), & \ell\_{sl}^2(q) &= \operatorname{Pr}\_P\left(L\_A^\flat(q).\Sigma^\*\right), \\\ell\_{sl}^3(q) &= \ell\_{sl}^2(q) - \operatorname{Pr}\_P\left(L\_A^\flat(q).L\_A(q)\right). \end{aligned}$$

Above, *weight*<sup>P</sup> (w) for a PA <sup>P</sup> = (*α*, *<sup>γ</sup>*, {*Δ*a}<sup>a</sup>∈<sup>Σ</sup>) and a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> is defined as *weight*<sup>P</sup> (w) = *<sup>α</sup>* ·*Δ*<sup>w</sup> · **<sup>1</sup>** (i.e., similarly as Pr<sup>P</sup> (w) but with the final weights *<sup>γ</sup>* discarded), and *weight*<sup>P</sup> (L) for <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> is defined as *weight*<sup>P</sup> (L) = <sup>w</sup>∈<sup>L</sup> *weight*<sup>P</sup> (w).

Intuitively, the state labelling <sup>1</sup> *sl*(q) computes the probability that q is reached from an initial state, so if q is pumped up with all possible word endings, this is the maximum possible error introduced by the added word endings. This has the following sources of imprecision: (1) the probability of some words may be included twice, e.g., when L <sup>A</sup>(q) = {a, ab}, the probabilities of all words from {ab}.Σ<sup>∗</sup> are included twice in <sup>1</sup> *sl*(q) because {ab}.Σ<sup>∗</sup> ⊆ {a}.Σ∗, and (2) 1 *sl*(q) can also contain probabilities of words that are already accepted on a run traversing q. The state labelling <sup>2</sup> *sl* deals with (1) by considering the probability of the language L <sup>A</sup>(q).Σ∗, and <sup>3</sup> *sl* deals also with (2) by subtracting from the result of <sup>2</sup> *sl* the probabilities of the words that pass through q and are accepted.

The computation of the state labellings for the self-loop reduction is done in a similar way as the computation of the state labellings for the pruning reduction (cf. Sect. 4.3). For a computation of *weight*<sup>P</sup> (L) one can use the same algorithm as for Pr<sup>P</sup> (L), only the final vector for PA P is set to **1**. The correctness of Algorithms 1 and 2 when instantiated using the self-loop reduction is stated in the following lemma.

**Lemma 8.** *For every* <sup>x</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>}*, the functions reducesl , error sl , and label*<sup>x</sup> *sl satisfy C1. Moreover, consider an NFA* A*, a PA* P*, and let* <sup>x</sup> *sl* = *label*<sup>x</sup> *sl*(A, P) *for* <sup>x</sup> ∈ {1, <sup>2</sup>, <sup>3</sup>}*. Then, for each* <sup>q</sup> <sup>∈</sup> <sup>Q</sup>[A]*, we have* <sup>1</sup> *sl*(q) <sup>≥</sup> <sup>2</sup> *sl*(q) <sup>≥</sup> <sup>3</sup> *sl*(q)*.*

### **5 Reduction of NFAs in Network Intrusion Detection Systems**

We have implemented our approach in a Python prototype named Appreal (APProximate REduction of Automata and Languages)<sup>4</sup> and evaluated it on the use case of network intrusion detection using Snort [1], a popular open source NIDS. The version of Appreal used for the evaluation in the current paper is available as an artifact [44] for the TACAS'18 artifact virtual machine [45].

#### **5.1 Network Traffic Model**

The reduction we describe in this paper is driven by a probabilistic model representing a distribution over Σ∗, and the formal guarantees are also wrt this model. We use *learning* to obtain a model of network traffic over the 8-bit ASCII alphabet at a given network point. Our model is created from several gigabytes of network traffic from a measuring point of the CESNET Internet provider connected to a 100 Gbps backbone link (unfortunately, we cannot provide the traffic dump since it may contain sensitive data).

Learning a PA representing the network traffic faithfully is hard. The PA cannot be too specific—although the number of different packets that can occur is finite, it is still extremely large (a conservative estimate assuming the most common scenario Ethernet/IPv4/TCP would still yield a number over 2<sup>10</sup>,<sup>000</sup>). If we assigned non-zero probabilities only to the packets from the dump (which are less than 2<sup>20</sup>), the obtained model would completely ignore virtually all packets that might appear on the network, and, moreover, the model would also be very large (millions of states), making it difficult to use in our algorithms. A generalization of the obtained traffic is therefore needed.

<sup>4</sup> https://github.com/vhavlena/appreal/tree/tacas18.

A natural solution is to exploit results from the area of PA learning, such as [39,40]. Indeed, we experimented with the use of Alergia [39], a learning algorithm that constructs a PA from a prefix tree (where edges are labelled with multiplicities) by merging nodes that are "similar." The automata that we obtained were, however, *too* general. In particular, the constructed automata destroyed the structure of network protocols—the merging was too permissive and the generalization merged distant states, which introduced loops over a very large substructure in the automaton (such a case usually does not correspond to the design of network protocols). As a result, the obtained PA more or less represented the Poisson distribution, having essentially no value for us.

In Sect. 5.2, we focus on the detection of malicious traffic transmitted over HTTP. We take advantage of this fact and create a PA representing the traffic while taking into account the structure of HTTP. We start by manually creating a DFA that represents the high-level structure of HTTP. Then, we proceed by feeding 34,191 HTTP packets from our sample into the DFA, at the same time taking notes about how many times every state is reached and how many times every transition is taken. The resulting PA P*HTTP* (of 52 states) is then obtained from the DFA and the labels in the obvious way.

The described method yields automata that are much better than those obtained using Alergia in our experiments. A disadvantage of the method is that it is only semi-automatic—the basic DFA needed to be provided by an expert. We have yet to find an algorithm that would suit our needs for learning more general network traffic.

#### **5.2 Evaluation**

We start this section by introducing the experimental setting, namely, the integration of our reduction techniques into the tool chain implementing efficient regex matching, the concrete settings of Appreal, and the evaluation environment. Afterwards, we discuss the results evaluating the quality of the obtained approximate reductions as well as of the provided error bounds. Finally, we present the performance of our approach and discuss its key aspects. Due to the lack of space, we selected the most interesting results demonstrating the potential as well as the limitations of our approach.

**General Setting.** Snort detects malicious network traffic based on *rules* that contain *conditions*. The conditions may take into consideration, among others, network addresses, ports, or Perl compatible regular expressions (PCREs) that the packet payload should match. In our evaluation, we always select a subset of Snort rules, extract the PCREs from them, and use Netbench [20] to transform them into a single NFA A. Before applying Appreal, we use the stateof-the-art NFA reduction tool Reduce [41] to decrease the size of A. Reduce performs a language-preserving reduction of A using advanced variants of simulation [31] (in the experiment reported in Table 3, we skip the use of Reduce at this step as discussed in the performance evaluation). The automaton ARed obtained as the result of Reduce is the input of Appreal, which performs one of the approximate reductions from Sect. 4 wrt the traffic model P*HTTP* , yielding AApp. After the approximate reduction, we, one more time, use Reduce and obtain the result A .

**Settings of APPREAL**. In the use case of NIDS pre-filtering, it may be important to never introduce a false negative, i.e., to never drop a malicious packet. Therefore, we focus our evaluation on the *self-loop reduction* (Sect. 4.4). In particular, we use the state labelling function *label* <sup>2</sup> *sl* , since it provides a good trade-off between the precision and the computational demands (recall that the computation of *label* <sup>2</sup> *sl* can exploit the "tentacle" structure of the NFAs we work with). We give more attention to the *size-driven reduction* (Sect. 4.1) since, in our setting, a bound on the available FPGA resources is typically given and the task is to create an NFA with the smallest error that fits inside. The order A,<sup>2</sup> *sl* over states used in Sects. 4.1 and 4.2 is defined as <sup>s</sup> A,<sup>2</sup> *sl* <sup>s</sup> <sup>⇔</sup> <sup>2</sup> *sl*(s) <sup>≤</sup> <sup>2</sup> *sl*(s ).

**Evaluation Environment.** All experiments run on a 64-bit Linux Debian workstation with the Intel Core(TM) i5-661 CPU running at 3.33 GHz with 16 GiB of RAM.

**Description of Tables.** In the caption of every table, we provide the name of the input file (in the directory regexps/tacas18/ of the repository of Appreal) with the selection of Snort regexes used in the particular experiment, together with the type of the reduction (size- or error-driven). All reductions are overapproximating (self-loop reduction). We further provide the size of the input automaton <sup>|</sup>A|, the size after the initial processing by Reduce (|ARed|), and the time of this reduction (*time*(Reduce)). Finally, we list the times of computing the state labelling *label* <sup>2</sup> *sl* on ARed (*time*(*label* <sup>2</sup> *sl*)), the exact probabilistic distance (*time*(Exact)), and also the number of *look-up tables* (*LUTs*(ARed)) consumed on the targeted FPGA (Xilinx Virtex 7 H580T) when ARed was synthesized (more on this in Sect. 5.3). The meaning of the columns in the tables is the following:


**Table 1.** Results for the http-malicious regex, <sup>|</sup>Amal<sup>|</sup> = 249, <sup>|</sup>ARed mal | = 98, *time*(Reduce)=3.5 s, *time*(*label* <sup>2</sup> *sl* ) = 38.7 s, *time*(Exact) = 3.8–6.5 s, and *LUTs*(ARed mal ) = 382.


**Error bound** shows the estimation of the error of A as determined by the reduction itself, i.e., it is the probabilistic distance computed by the function *error* in Sect. 4.


#### **Approximation Errors**

Table 1 presents the results of the self-loop reduction for the NFA Amal describing http-malicious regexes. We can observe that the differences between the upper bounds on the probabilistic distance and its real value are negligible (typically in the order of 10−<sup>4</sup> or less). We can also see that the probabilistic distance agrees with the traffic error. This indicates a good quality of the traffic model employed in the reduction process. Further, we can see that our approach can provide useful trade-offs between the reduction error and the reduction factor. Finally, Table 1 shows that a significant reduction is obtained when the error threshold is increased from 0.04 to 0.07.

Table 2 presents the results of the size-driven self-loop reduction for NFA Aatt describing httpattacks regexes. We can observe that the error bounds provide again a very good approximation of the real probabilistic distance. On the other hand, the difference between the probabilistic distance and the traffic error is larger than for Amal. Since all experiments use the same probabilistic automaton and the same traf-

**Table 2.** Results for the http-attacks regex, size-driven reduction, <sup>|</sup>Aatt<sup>|</sup> = 142, <sup>|</sup>ARed att | = 112, *time*(Reduce)=7.9 s, *time*(*label* <sup>2</sup> *sl* ) = <sup>28</sup>.3 min, *time*(Exact) = 14.0–16.4 min.


fic, this discrepancy is accounted to the different set of packets that are incorrectly accepted by ARed att . If the probability of these packets is adequately captured in the traffic model, the difference between the distance and the traffic error is small and vice versa. This also explains an even larger difference in Table 3 (presenting the results for Abd constructed from http-backdoor regexes) for <sup>k</sup> ∈ -<sup>0</sup>.2, <sup>0</sup>.4. Here, the traffic error is very small and caused by a small set of packets (approx. 70), whose probability is not correctly captured in the traffic model. Despite this problem, the results clearly show that our approach still provides significant reductions while keeping the traffic error small: about a 5-fold reduction is obtained for the traffic error 0.03 % and a 10-fold reduction is obtained for the traffic error 6.3 %. We discuss the practical impact of such a reduction in Sect. 5.3.

#### **Performance of the Approximate Reduction**

In all our experiments (Tables 1, 2 and 3), we can observe that the most time-consuming step of the reduction process is the computation of state labellings (it takes at least 90 % of the total time). The crucial observation is that the structure of the NFAs fundamentally affects the performance of this step. Although after Reduce, the size of Amal is very similar to the size of Aatt,



computing *label* <sup>2</sup> *sl* takes more time (28.3 min vs. 38.7 s). The key reason behind this slowdown is the determinisation (or alternatively disambiguation) process required by the product construction underlying the state labelling computation (cf. Sect. 4.4). For Aatt, the process results in a significantly larger product when compared to the product for Amal. The size of the product directly determines the time and space complexity of solving the linear equation system required for computing the state labelling.

As explained in Sect. 4, the computation of the state labelling *label* <sup>2</sup> *sl* can exploit the "tentacle" structure of the NFAs appearing in NIDSes and thus can be done component-wise. On the other hand, our experiments reveal that the use of Reduce typically breaks this structure and thus the component-wise computation cannot be effectively used. For the NFA Amal, this behaviour does not have any major performance impact as the determinisation leads to a moderate-sized automaton and the state labelling computation takes less than 40 s. On the other hand, this behaviour has a dramatic effect for the NFA Aatt. By disabling the initial application of Reduce and thus preserving the original structure of Aatt, we were able to speed up the state label computation from 28.3 min to 1.5 min. Note that other steps of the approximate reduction took a similar time as before disabling Reduce and also that the trade-offs between the error and the reduction factor were similar. Surprisingly, disabling Reduce caused that the computation of the exact probabilistic distance became computationally infeasible because the determinisation ran out of memory.

Due to the size of the NFA Abd, the impact of disabling the initial application of Reduce is even more fundamental. In particular, computing the state labelling took only 19.9 min, in contrast to running out of memory when the Reduce is applied in the first step (therefore, the input automaton is not processed by Reduce in Table 3; we still give the number of LUTs of its reduced version for comparison, though). Note that the size of Abd also slows down other reduction steps (the greedy algorithm and the final Reduce reduction). We can, however, clearly see that computing the state labelling is still the most time-consuming step.

#### **5.3 The Real Impact in an FPGA-Accelerated NIDS**

Further, we also evaluated some of the obtained automata in the setting of [5] implementing a high-speed NIDS pre-filter. In that setting, the amount of resources available for the regex matching engine is 15,000 LUTs<sup>5</sup> and the frequency of the engine is 200 MHz. We synthesized NFAs that use a 32-bit-wide data path, corresponding to processing 4 ASCII characters at once, which is according to the analysis in [5]—the best trade-off between the utilization of the chip resources and the maximum achievable frequency. A simple analysis shows that the throughput of one automaton is 6.4 Gbps, so in order to reach the desired link speed of 100 Gbps, 16 units are required, and 63 units are needed to handle 400 Gbps. With the given amount of LUTs, we are therefore bounded by 937 LUTs for 100 Gbps and 238 LUTs for 400 Gbps.

We focused on the consumption of LUTs by an implementation of the regex matching engines for http-backdoor (ARed bd ) and http-malicious (ARed mal ).

– **100 Gbps:** For this speed, ARed mal can be used without any approximate reduction as it is small enough to fit in the available space. On the other hand, ARed bd

<sup>5</sup> We omit the analysis of flip-flop consumption because in our setting it is dominated by the LUT consumption.

without the approximate reduction is way too large to fit (at most 6 units fit inside the available space, yielding the throughput of only 38.4 Gbps, which is unacceptable). The column **LUTs** in Table 3 shows that using our framework, we are able to reduce ARed bd such that it uses 894 LUTs (for *k* = 0.3), and so all the needed 16 units fit into the FPGA, yielding the throughput over 100 Gbps and the theoretical error bound of a false positive <sup>≤</sup> <sup>3</sup>.4×10−<sup>8</sup> wrt the model P*HTTP* .

– **400 Gbps:** Regex matching at this speed is extremely challenging. The only reduced version of ARed bd that fits in the available space is the one for the value *k* = 0.1 with the error bound almost 1. The situation is better for ARed mal . In the exact version, at most 39 units can fit inside the FPGA with the maximum throughput of 249.6 Gbps. On the other hand, when using our approximate reduction framework, we are able to place 63 units into the FPGA, each of the size 224 LUTs (*k* = 0.6) with the throughput over 400 Gbps and the theoretical error bound of a false positive <sup>≤</sup> <sup>8</sup>.7×10−<sup>8</sup> wrt the model <sup>P</sup>*HTTP* .

### **6 Conclusion**

We have proposed a novel approach for approximate reduction of NFAs used in network traffic filtering. Our approach is based on a proposal of a probabilistic distance of the original and reduced automaton using a probabilistic model of the input network traffic, which characterizes the significance of particular packets. We characterized the computational complexity of approximate reductions based on the described distance and proposed a sequence of heuristics allowing one to perform the approximate reduction in an efficient way. Our experimental results are quite encouraging and show that we can often achieve a very significant reduction for a negligible loss of precision. We showed that using our approach, FPGA-accelerated network filtering on large traffic speeds can be applied on regexes of malicious traffic where it could not be applied before.

In the future, we plan to investigate other approximate reductions of the NFAs, maybe using some variant of abstraction from abstract regular model checking [42], adapted for the given probabilistic setting. Another important issue for the future is to develop better ways of learning a suitable probabilistic model of the input traffic.

*Data Availability Statement and Acknowledgements.* The tool used for the experimental evaluation in the current study is available in the following figshare repository: https://doi.org/10.6084/m9.figshare.5907055.v1. We thank Jan Koˇrenek, Vlastimil Koˇsaˇr, and Denis Matouˇsek for their help with translating regexes into automata and synthesis of FPGA designs, and Martin Z´ˇadn´ık for providing us with the backbone network traffic. We thank Stefan Kiefer for helping us proving the **PSPACE** part of Lemma 1 and Petr Peringer for testing our artifact. The work on this paper was supported by the Czech Science Foundation project 16-17538S, the IT4IXS: IT4Innovations Excellence in Science project (LQ1602), and the FIT BUT internal project FIT-S-17-4014.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Validity-Guided Synthesis of Reactive Systems from Assume-Guarantee Contracts**

Andreas Katis1(B) , Grigory Fedyukovich<sup>2</sup> , Huajun Guo<sup>1</sup>, Andrew Gacek<sup>3</sup>, John Backes<sup>3</sup>, Arie Gurfinkel<sup>4</sup>, and Michael W. Whalen<sup>1</sup>

> <sup>1</sup> Department of Computer Science and Engineering, University of Minnesota, Minneapolis, USA {katis001,guoxx663}@umn.edu, whalen@cs.umn.edu <sup>2</sup> Department of Computer Science, Princeton University, Princeton, USA grigoryf@cs.princeton.edu <sup>3</sup> Rockwell Collins Advanced Technology Center, Cedar Rapids, USA {andrew.gacek,john.backes}@rockwellcollins.com <sup>4</sup> Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada agurfinkel@uwaterloo.ca

**Abstract.** Automated synthesis of reactive systems from specifications has been a topic of research for decades. Recently, a variety of approaches have been proposed to extend synthesis of reactive systems from propositional specifications towards specifications over rich theories. We propose a novel, completely automated approach to program synthesis which reduces the problem to deciding the validity of a set of ∀∃-formulas. In spirit of IC3/PDR, our problem space is recursively refined by blocking out regions of unsafe states, aiming to discover a fixpoint that describes safe reactions. If such a fixpoint is found, we construct a witness that is directly translated into an implementation. We implemented the algorithm on top of the JKind model checker, and exercised it against contracts written using the Lustre specification language. Experimental results show how the new algorithm outperforms JKind's already existing synthesis procedure based on k-induction and addresses soundness issues in the k-inductive approach with respect to unrealizable results.

### **1 Introduction**

Program synthesis is one of the most challenging problems in computer science. The objective is to define a process to automatically derive implementations that are guaranteed to comply with specifications expressed in the form of logic formulas. The problem has seen increased popularity in the recent years, mainly due to the capabilities of modern symbolic solvers, including Satisfiability Modulo Theories (SMT) [1] tools, to compute compact and precise regions that describe under which conditions an implementation exists for the given specification [25]. As a result, the problem has been well-studied for the area of propositional specifications (see Gulwani [15] for a survey), and approaches have been proposed to tackle challenges involving richer specifications. Template-based techniques focus on synthesizing programs that match a certain shape (the template) [28], while *inductive synthesis* uses the idea of refining the problem space using counterexamples, to converge to a solution [12]. A different category is that of *functional synthesis*, in which the goal is to construct functions from pre-defined input/output relations [22].

Our goal is to effectively synthesize programs from safety specifications written in the Lustre [18] language. These specifications are structured in the form of *Assume-Guarantee* contracts, similarly to approaches in Linear Temporal Logic [11]. In prior work, we developed a solution to the synthesis problem which is based on k-induction [14,19,21]. Despite showing good results, the approach suffers from soundness problems with respect to unrealizable results; a contract could be declared as unrealizable, while an actual implementation exists. In this work, we propose a novel approach that is a direct improvement over the k-inductive method in two important aspects: performance and generality. On all models that can be synthesized by k-induction, the new algorithm always outperforms in terms of synthesis time while yielding roughly approximate code sizes and execution times for the generated code. More importantly, the new algorithm can synthesize a strictly larger set of benchmark models, and comes with an improved termination guarantee: unlike in k-induction, if the algorithm terminates with an "unrealizable" result, then there is no possible realization of the contract.

The technique has been used to synthesize contracts involving linear real and integer arithmetic (LIRA), but remains generic enough to be extended into supporting additional theories in the future, as well as to liveness properties that can be reduced to safety properties (as in k-liveness [7]). Our approach is completely automated and requires no guidance to the tools in terms of user interaction (unlike [26,27]), and it is capable of providing solutions without requiring any templates, as in e.g., work by Beyene et al. [2]. We were able to automatically solve problems that were "hard" and required hand-written templates specialized to the problem in [2].

The main idea of the algorithm was inspired by induction-based model checking, and in particular by IC3/Property Directed Reachability (PDR) [4,9]. In PDR, the goal is to discover an inductive invariant for a property, by recursively blocking generalized regions describing unsafe states. Similarly, we attempt to reach a greatest fixpoint that contains states that react to arbitrary environment behavior and lead to states within the fixpoint that comply with all guarantees. Formally, the greatest fixpoint is sufficient to prove the validity of a ∀∃-formula, which states that for any state and environment input, there exists a system reaction that complies with the specification. Starting from the entire problem space, we recursively block regions of states that violate the contract, using *regions of validity* that are generated by invalid ∀∃-formulas. If the refined ∀∃-formula is valid, we reach a fixpoint which can effectively be used by the specified transition relation to provide safe reactions to environment inputs. We then extract a witness for the formula's satisfiability, which can be directly transformed into the language intended for the system's implementation.

The algorithm was implemented as a feature in the JKind model checker and is based on the general concept of extracting a witness that satisfies a ∀∃-formula, using the AE-VAL Skolemizer [10,19]. While AE-VAL was mainly used as a tool for solving queries and extracting Skolems in our k-inductive approach, in this paper we also take advantage of its capability to generate *regions of validity* from invalid formulas to reach a fixpoint of satisfiable assignments to state variables.

The contributions of the paper are therefore:


The rest of the paper is organized as follows. Section 2 briefly describes the Cinderella-Stepmother problem that we use as an example throughout the paper. In Sect. 3, we provide the necessary formal definitions to describe the synthesis algorithm, which is presented then in Sect. 4. We present an evaluation in Sect. 5 and comparison against a method based on k-induction that exists using the same input language. Finally, we discuss the differences of our work with closely related ideas in Sect. 6 and conclude in Sect. 7.

### **2 Overview: The Cinderella-Stepmother Game**

We illustrate the flow of the validity guided-synthesis algorithm using a variation of the minimum-backlog problem, the two player game between Cinderella and her wicked Stepmother, first expressed by Bodlaender *et al.* [3].

The main objective for Cinderella (i.e. the reactive system) is to prevent a collection of buckets from overflowing with water. On the other hand, Cinderella's Stepmother (i.e. the system's environment) refills the buckets with a predefined amount of water that is distributed in a random fashion between the buckets. For the running example, we chose an instance of the game that has been previously used in template-based synthesis [2]. In this instance, the game is described using five buckets, where each bucket can contain up to two units of water. Cinderella has the option to empty two adjacent buckets at each of her turns, while the Stepmother distributes one unit of water over all five buckets. In the context of this paper we use this example to show how specification is expressed, as well as how we can synthesize an efficient implementation that describes reactions for Cinderella, such that a bucket overflow is always prevented.

**Fig. 1.** An Assume-Guarantee contract.

We represent the system requirements using an *Assume-Guarantee Contract*. The *assumptions* of the contract restrict the possible inputs that the environment can provide to the system, while the *guarantees* describe safe reactions of the system to the outside world.

A (conceptually) simple example is shown in Fig. 1. The contract describes a possible set of requirements for a specific instance of the Cinderella-Stepmother game. Our goal is to synthesize an implementation that describes Cinderella's winning region of the game. Cinderella in this case is the implementation, as shown by the middle box in Fig. 1. Cinderella's inputs are five different values i*k*, 1 ≤ k ≤ 5, determined by a random distribution of one unit of water by the Stepmother. During each of her turns Cinderella has to make a choice denoted by the output variable e, such that the buckets b*<sup>k</sup>* do not overflow during the next action of her Stepmother. We define the contract using the set of assumptions A (left box in Fig. 1) and the guarantee constraints G (right box in Fig. 1). For the particular example, it is possible to construct at least one implementation that satisfies G given A which is described in Sect. 4.3. The proof of existence of such an implementation is the main concept behind the *realizability* problem, while the automated construction of a witness implementation is the main focus of *program synthesis*.

Given a proof of realizability of the contract in Fig. 1, we are seeking for an efficient synthesis procedure that could provide an implementation. On the other hand, consider a variation of the example, where A = *true*. This is a practical case of an *unrealizable* contract, as there is no feasible Cinderella implementation that can correctly react to Stepmother's actions. An example counterexample allows the Stepmother to pour random amounts of water into the buckets, leading to overflow of at least one bucket during each of her turns.

#### **3 Background**

We use two disjoint sets, state and inputs, to describe a system. A straightforward and intuitive way to represent an *implementation* is by defining a *transition system*, composed of an initial state predicate I(s) of type state → bool, as well as a transition relation T(s, i, s ) of type state → inputs → state → bool.

Combining the above, we represent an Assume-Guarantee (AG) contract using a set of *assumptions*, A : state → inputs → bool, and a set of *guarantees* G. The latter is further decomposed into two distinct subsets G*<sup>I</sup>* : state → bool and G*<sup>T</sup>* : state → inputs → state → bool. The G*<sup>I</sup>* defines the set of valid initial states, and G*<sup>T</sup>* contains constraints that need to be satisfied in every transition between two states. Importantly, we do not make any distinction between the internal state variables and the output variables in the formalism. This allows us to use the state variables to (in some cases) simplify the specification of guarantees since a contract might not be always defined over all variables in the transition system.

Consequently, we can formally define a realizable contract, as one for which any preceding state s can transition into a new state s that satisfies the guarantees, assuming valid inputs. For a system to be ever-reactive, these new states s should be further usable as preceding states in a future transition. States like s and s are called *viable* if and only if:

$$\mathsf{Valable}(s) = \forall i. (A(s, i) \Rightarrow \exists s'. \ G\_T(s, i, s') \land \mathsf{Valable}(s')) \tag{1}$$

This equation is recursive and we interpret it coinductively, i.e., as a greatest fixpoint. A necessary condition, finally, is that the intersection of sets of viable states and initial states is non-empty. As such, to conclude that a contract is realizable, we require that

$$\exists s. G\_I(s) \land \mathsf{Valable}(s) \tag{2}$$

The synthesis problem is therefore to determine an initial state s*<sup>i</sup>* and function f(s, i) such that G*<sup>I</sup>* (s*i*) and ∀s, i.Viable(s) ⇒ Viable(f(s, i)).

The intuition behind our proposed algorithm in this paper relies on the discovery of a fixpoint F that only contains viable states. We can determine whether F is a fixpoint by proving the validity of the following formula:

$$\forall s, i. \ (F(s) \land A(s, i) \Rightarrow \exists s'. G\_T(s, i, s') \land F(s'))$$

In the case where the greatest fixpoint F is non-empty, we check whether it satisfies G*<sup>I</sup>* for some initial state. If so, we proceed by extracting a witnessing initial state and witnessing skolem function f(s, i) to determine s that is, by construction, guaranteed to satisfy the specification.

To achieve both the fixpoint generation and the witness extraction, we depend on AE-VAL, a solver for ∀∃-formulas.

#### **3.1 Skolem Functions and Regions of Validity**

We rely on the already established algorithm to decide the validity of ∀∃-formulas and extract Skolem functions, called AE-VAL [10]. It takes as input a formula <sup>∀</sup>x . <sup>∃</sup>y.Φ(x, y) where <sup>Φ</sup>(x, y) is quantifier-free. To decide its validity, AE-VAL first normalizes Φ(x, y) to the form S(x) ⇒ T(x, y) and then attempts to extend all models of S(x) to models of T(x, y). If such an extension is possible, then the input formula is valid, and a relationship between x and y are gathered in a

**Fig. 2.** Region of validity computed for an example requiring AE-VAL to iterate two times.

Skolem function. Otherwise the formula is invalid, and no Skolem function exists. We refer the reader to [19] for more details on the Skolem-function generation.

Our approach presented in this paper relies on the fact that during each run, AE-VAL iteratively creates a set of formulas {P*i*(x)}, such that each <sup>P</sup>*i*(x) has a common model with <sup>S</sup>(x) and <sup>P</sup>*i*(x) ⇒ ∃y.T(x, y). After <sup>n</sup> iterations, AE-VAL establishes a formula R*n*(x) def = *n <sup>i</sup>*=1 P*i*(x) which by construction implies ∃y.T(x, y). If additionally S(x) ⇒ R*n*(x), the input formula is valid, and the algorithm terminates. Figure 2 shows a Venn diagram for an example of the opposite scenario: R2(x) = T1(x) ∨ T2(x), but the input formula is invalid. However, models of each S(x) ∧ P*i*(x) can still be extended to a model of T(x, y).

In general, if after n iterations S(x) ∧ T(x, y) ∧ ¬R*n*(x) is unsatisfiable, then AE-VAL terminates. Note that the formula <sup>∀</sup>x. S(x) <sup>∧</sup> <sup>R</sup>*n*(x) ⇒ ∃y. T(x, y) is valid by construction at any iteration of the algorithm. We say that R*n*(x) is a *region of validity*, and in this work, we are interested in the *maximal* regions of validity, i.e., the ones produced by disjoining all {P*i*(x)} produced by AE-VAL before termination and by conjoining it with S(x). Throughout the paper, we assume that all regions of validity are maximal.

**Lemma 1.** *Let* R*n*(x) *be the region of validity returned by* AE-VAL *for formula* ∀s. S(x) ⇒ ∃y.T(x, y)*. Then* ∀x. S(x) ⇒ (R*n*(x) ⇔ ∃y.T(x, y))*.*

*Proof.* (⇒) By construction of R*n*(x).

(⇐) Suppose towards contradiction that the formula does not hold. Then there exists x<sup>0</sup> such that S(x0) ∧ (∃y.T(x0, y)) ∧ ¬R*n*(x0) holds. But this is a direct contradiction for the termination condition for AE-VAL. Therefore the original formula does hold.

### **4 Validity-Guided Synthesis from Assume-Guarantee Contracts**

Algorithm 1, named JSyn-vg (for *validity guided*), shows the validity-guided technique that we use towards the automatic synthesis of implementations.


**Algorithm 1.** JSyn-vg (A: assumptions, G: guarantees)

The specification is written using the Assume-Guarantee convention that we described in Sect. 3 and is provided as an input. The algorithm relies on AE-VAL, for each call of which we write x, y, z<sup>←</sup> AE-VAL(...): <sup>x</sup> specifies if the given formula is *valid* or *invalid*, y identifies the region of validity (in both cases), and z – the Skolem function (only in case of the validity).

The algorithm maintains a formula F(s) which is initially assigned *true* (line 1). It then attempts to strengthen F(s) until it only contains viable states (recall Eqs. 1 and 2), i.e., a greatest fixpoint is reached. We first encode Eq. 1 in a formula φ and then provide it as input to AE-VAL (line 4) which determines its validity (line 5). If the formula is valid, then a witness *Skolem* is non-empty. By construction, it contains valid assignments to the existentially quantified variables of φ. In the context of viability, this witness is capable of providing viable states that can be used as a safe reaction, given an input that satisfies the assumptions.

With the valid formula φ in hand, it remains to check that the fixpoint intersects with the initial states, i.e., to find a model of formula in Eq. 2 by a simple satisfiability check. If a model exists, it is directly combined with the extracted witness and used towards an implementation of the system, and the algorithm terminates (line 7). Otherwise, the contract is unrealizable since either there are no states that satisfy the initial state guarantees G*<sup>I</sup>* , or the set of viable states F is empty.

If φ is not true for every possible assignment of the universally quantified variables, AE-VAL provides a *region of validity* Q(s, i) (line 11). At this point, one might assume that Q(s, i) is sufficient to restrict F towards a solution. This is not the case since Q(s, i) creates a subregion involving both state and input variables. As such, it may contain constraints over the contract's inputs above what are required by A, ultimately leading to implementations that only work correctly for a small part of the input domain.

Fortunately, we can again use AE-VAL's capability of providing regions of validity towards removing inputs from Q. Essentially, we want to remove those states from Q if even one input causes them to violate the formula on line 3. We denote by W the *violating region* of Q. To construct W, AE-VAL determines the validity of formula φ ← ∀s. (F(s) ⇒ ∃i.A(s, i) ∧ ¬Q(s, i)) (line 12) and computes a new region of validity.

If φ is invalid, it indicates that there are still non-violating states (i.e., outside W) that may lead to a fixpoint. Thus, the algorithm removes the unsafe states from F(s) in line 15, and iterates until a greatest fixpoint for F(s) is reached. If φ is valid, then every state in F(s) is unsafe, under a specific input that satisfies the contract assumptions (since ¬Q(s, i) holds in this case), and the specification is unrealizable (i.e., in the next iteration, the algorithm will reach line 9).

#### **4.1 Soundness**

### **Lemma 2.** Viable ⇒ F *is an invariant for Algorithm 1.*

*Proof.* It suffices to show this invariant holds each time F is assigned. On line 1, this is trivial. For line 15, we can assume that Viable ⇒ F holds prior to this line. Suppose towards contradiction that the assignment on line 15 violates the invariant. Then there exists s<sup>0</sup> such that F(s0), W(s0), and Viable(s0) all hold. Since W is the region of validity for φ on line 12, we have W(s0) ∧ F(s0) ⇒ ∃i.A(s0, i) ∧ ¬Q(s0, i) by Lemma 1. Given that W(s0) and F(s0) hold, let i<sup>0</sup> be such that A(s0, i0) and ¬Q(s0, i0) hold. Since Q is the region of validity for φ on line 3, we have F(s0) ∧ A(s0, i0) ∧ ∃s .G*<sup>T</sup>* (s0, i0, s ) ∧ F(s ) ⇒ Q(s0, i0) by Lemma 1. Since F(s0), A(s0, i0) and ¬Q(s0, i0) hold, we conclude that ∃s .G*<sup>T</sup>* (s0, i0, s ) ∧ F(s ) ⇒ ⊥. We know that Viable ⇒ F holds prior to line 15, thus ∃s .G*<sup>T</sup>* (s0, i0, s ) ∧ Viable(s ) ⇒ ⊥. But this is a contradiction since Viable(s0) holds. Therefore the invariant holds on line 15.

**Theorem 1.** *The* realizable *and* unrealizable *results of Algorithm 1 are sound.*

*Proof.* If Algorithm 1 terminates, then the formula for φ on line 3 is valid. Rewritten, F satisfies the formula

$$\forall s.\;F(s)\Rightarrow \left(\forall i.\;A(s,i)\Rightarrow \exists s'.G\_T(s,i,s')\land F(s')\right).\tag{3}$$

Let the function f be defined over state predicates as

$$f = \lambda V.\lambda s.\ \forall i.\ A(s, i) \Rightarrow \exists s'. G\_T(s, i, s') \land V(s').\tag{4}$$

State predicates are equivalent to subsets of the state space and form a lattice in the natural way. Moreover, f is monotone on this lattice. From Eq. 3 we have F ⇒ f(F). Thus F is a post-fixed point of f. In Eq. 1, Viable is defined as the greatest fixed-point of f. Thus f ⇒ Viable by the Knaster-Tarski theorem. Combining this with Lemma 2, we have F = Viable. Therefore the check on line 7 is equivalent to the check in Eq. 2 for realizability.

**Fig. 3.** An Assume-Guarantee contract for the Cinderella-Stepmother game in Lustre.

#### **4.2 Termination on Finite Models**

**Lemma 3.** *Every loop iteration in Algorithm 1 either terminates or removes at least one state from* F*.*

*Proof.* It suffices to show that at least one state is removed from F on line 15. That is, we want to show that <sup>F</sup> <sup>∩</sup> <sup>W</sup> <sup>=</sup> <sup>∅</sup> since this intersection is what is removed from F by line 15.

If the query on line 4 is valid, then the algorithm terminates. If not, then there exists a state s<sup>∗</sup> and input i <sup>∗</sup> such that F(s∗) and A(s∗, i∗) such that there is no state s where both G(s∗, i∗, s ) and F(s ) hold. Thus, ¬Q(s∗, i∗), and <sup>s</sup><sup>∗</sup> <sup>∈</sup> *violatingRegion*, so <sup>W</sup> <sup>=</sup> <sup>∅</sup>. Next, suppose towards contradiction that <sup>F</sup> <sup>∩</sup> <sup>W</sup> <sup>=</sup> <sup>∅</sup> and <sup>W</sup> <sup>=</sup> <sup>∅</sup>. Since <sup>W</sup> is the region of validity for <sup>φ</sup> on line 12, we know that F lies completely outside the region of validity and therefore ∀s. ¬∃i.A(s, i) ∧ ¬Q(s, i) by Lemma 1. Rewritten, ∀s, i. A(s, i) ⇒ Q(s, i). Note that Q is the region of validity for φ on line 3. Thus A is completely contained within the region of validity and formula φ is valid. This is a contradiction since if φ is valid then line 15 will not be executed in this iteration of the loop. Therefore <sup>F</sup> <sup>∩</sup> <sup>W</sup> <sup>=</sup> <sup>∅</sup> and at least one state is removed from <sup>F</sup> on line 15.

**Theorem 2.** *For finite models, Algorithm 1 terminates.*

*Proof.* Immediately from Lemma 3 and the fact that AE-VAL terminates on finite models [10].

#### **4.3 Applying JSYN-VG to the Cinderella-Stepmother Game**

Figure 3 shows one possible interpretation of the contract designed for the instance of the Cinderella-Stepmother game that we introduced in Sect. 2. The contract is expressed in Lustre [18], a language that has been extensively used for specification as well as implementation of safety-critical systems, and is the kernel language in SCADE, a popular tool in model-based development. The contract is defined as a Lustre node game, with a global constant C denoting the bucket capacity. The node describes the game itself, through the problem's input and output variables. The main input is Stepmother's distribution of one unit of water over five different input variables, i1 to i5. While the node contains a sixth input argument, namely e, this is in fact used as the output of the system that we want to implement, representing Cinderella's choice at each of her turns.

We specify the system's inputs i1, ..., i5 using the REALIZABLE statement and define the contract's assumptions over them: A(i1,...,i5)=(<sup>5</sup> *<sup>k</sup>*=1 i*<sup>k</sup>* >= 0.0) ∧ ( <sup>5</sup> *<sup>k</sup>*=1 <sup>i</sup>*<sup>k</sup>* = 1.0). The assignment to boolean variable guarantee (distinguished via the PROPERTY statement) imposes the guarantee constraints on the buckets' states through the entire duration of the game, using the local variables b1 to b5. Initially, each bucket is empty, and with each transition to a new state, the contents depend on whether Cinderella chose the specific bucket, or an adjacent one. If so, the value of each b*<sup>k</sup>* at the next turn becomes equal to the value of the corresponding input variable i*k*. Formally, for the initial state, G*<sup>I</sup>* (C, b1,...,b5)=(<sup>5</sup> *<sup>k</sup>*=1 b*<sup>k</sup>* = 0.0)∧( 5 *<sup>k</sup>*=1 b*<sup>k</sup>* ≤ C), while the transitional guarantee is G*<sup>T</sup>* ([C, b1,...,b5, e], i1,...,i5, [C , b 1,...,b 5, e ]) = (<sup>5</sup> *<sup>k</sup>*=1 b *<sup>k</sup>* = ite(e = k ∨ e = k*prev*, i*k*, b*<sup>k</sup>* + i*k*) ∧ ( 5 *<sup>k</sup>*=1 b *<sup>k</sup>* ≤ C ), where k*prev* = 5 if k = 1, and k*prev* = k − 1 otherwise. Interestingly, the lack of explicit constraints over e, i.e. Cinderella's choice, permits the action of Cinderella skipping her current turn, i.e. she does not choose to empty any of the buckets. With the addition of the guarantee (e = 1) ∨ ... ∨ (e = 5), the contract is still realizable, and the implementation is verifiable, but Cinderella is not allowed to skip her turn anymore.

If the bucket was not covered by Cinderella's choice, then its contents are updated by adding Stepmother's distribution to the volume of water that the bucket already had. The arrow (->) operator distinguishes the initial state (on the left) from subsequent states (on the right), and variable values in the previous state can be accessed using the pre operator. The contract should only be realizable if, assuming valid inputs given by the Stepmother (i.e. positive values to input variables that add up to one water unit), Cinderella can keep reacting indefinitely, by providing outputs that satisfy the guarantees (i.e. she empties buckets in order to prevent overflow in Stepmother's next turn). We provide the contract in Fig. 3 as input to Algorithm 1 which then iteratively attempts to construct a fixpoint of viable states, closed under the transition relation.

Initially F = *true*, and we query AE-VAL for the validity of formula ∀i1,..., i5, b1,...,b<sup>5</sup> . A(i1,...,i5) ⇒ ∃b 1,...,b <sup>5</sup>,e.G*<sup>T</sup>* (i1,...,i5, b1,...,b5, b 1, ...,b <sup>5</sup>, e). Since F is empty, there are states satisfying A, for which there is no transition to G*<sup>T</sup>* . In particular, one such counterexample identified by AE-VAL is represented by the set of assignments *cex* = {...,b<sup>4</sup> = 3025, i<sup>4</sup> = 0.2, b <sup>4</sup> = 3025.2,...}, where the already overflown bucket b<sup>4</sup> receives additional water during the transition to the next state, violating the contract guarantees. In addition, AE-VAL provides us with a region of validity Q(i1,...,i5, b1,...,b5), a formula for which ∀i1,..., i5, b1,...,b<sup>5</sup> . A(i1,...,i5) ∧ Q(i1,...,i5, b1,...,b5) ⇒ ∃b 1,...,b <sup>5</sup>,e.G*<sup>T</sup>* (i1,...,i5, b1,...,b5, b 1,...,b <sup>5</sup>, e) is valid. Precise encoding of Q is too large to be presented in the paper; intuitively it contains some constraints on i1,...,i<sup>5</sup> and b1,...,b*<sup>k</sup>* which are stronger than A and which block the inclusion of violating states such as the one described by *cex* .

Since Q is defined over both state and input variables, it might contain constraints over the inputs, which is an undesirable side-effect. In the next step, AE-VAL decides the validity of formula <sup>∀</sup>b1,...,b<sup>5</sup> . <sup>∃</sup>i1,...,i<sup>5</sup> . A(i1,...,i5) <sup>∧</sup> ¬Q(i1,...,i5, b1,...,b5) and extracts a violating region W over b1,...,b5. Precise encoding of W is also too large to be presented in the paper; and intuitively it captures certain steps in which Cinderella may not take the optimal action. Blocking them leads us eventually to proving the contract's realizability.

From this point on, the algorithm continues following the steps explained above. In particular, it terminates after one more refinement, at depth 2. At that point, the refined version of φ is valid, and AE-VAL constructs a witness containing valid reactions to environment behavior. In general, the witness is described through the use of nested *if-then-else* blocks, where the conditions are subsets of the antecedent of the implication in formula φ, while the body contains valid assignments to state variables to the corresponding subset.

### **5 Implementation and Evaluation**

The implementation of the algorithm has been added to a branch of the JKind [13] model checker<sup>1</sup>. JKind officially supports synthesis using a kinductive approach, named JSyn [19]. For clarity, we named our validity-guided technique JSyn-vg (i.e., validity-guided synthesis). JKind uses Lustre [18] as its specification and implementation language. JSyn-vg encodes Lustre specifications in the language of linear real and integer arithmetic (LIRA) and communicates them to AE-VAL<sup>2</sup>. Skolem functions returned by AE-VAL get then translated into an efficient and practical implementation. To compare the quality of implementations against JSyn, we use SMTLib2C, a tool that has been specifically developed to translate Skolem functions to C implementations<sup>3</sup>.

#### **5.1 Experimental Results**

We evaluated JSyn-vg by synthesizing implementations for 124 contracts <sup>4</sup> originated from a broad variety of contexts. Since we have been unable to find past work that contained benchmarks directly relevant to our approach, we propose a comprehensive collection of contracts that can be used by the research community for future advancements in reactive system synthesis for contracts that rely on infinite theories. Our benchmarks are split into three categories:

<sup>1</sup> The JKind fork with JSyn-vg is available at https://goo.gl/WxupTe.

<sup>2</sup> The AE-VAL tool is available at https://goo.gl/CbNMVN.

<sup>3</sup> The SMTLib2C tool is available at https://goo.gl/EvNrAU.

<sup>4</sup> All of the benchmark contracts can be found at https://goo.gl/2p4sT9.


All of the synthesized implementations were verified against the original contracts using JKind.

The goal of this experiment was to determine the performance and generality of the JSyn-vg algorithm. We compared against the existing JSyn algorithm, and for the Cinderella model, we compared against [2] (this was the only synthesis problem in the paper). We examined the following aspects:


Since JKind already supports synthesis through JSyn, we were able to directly compare JSyn-vg against JSyn's k-inductive approach. We ran the experiments using a computer with Intel Core i3-4010U 1.70 GHz CPU and 16 GB RAM.

A listing of the statistics that we tracked while running experiments is presented in Table 1. Fig. 4a shows the time allocated by JSyn and JSyn-vg to solve each problem, with JSyn-vg outperforming JSyn for the vast majority of the benchmark suite, often times by a margin greater than 50%. Fig. 4b on the other hand, depicts small differences in the overall size between the synthesized implementations. While it would be reasonable to conclude that there are no noticeable improvements, the big picture is different: solutions by JSyn-vg always require just a single Skolem function, but solutions by JSyn may require several (k − 1 to initialize the system, and one for the inductive step). In our evaluation, JSyn proved the realizability of the majority of benchmarks by constructing proofs of length k = 0, which essentially means that the entire space of states is an inductive invariant. However, several spikes in Fig. 4b refer to benchmarks, for which JSyn constructed a proof of length k > 0, which was significantly longer that the corresponding proof by JSyn-vg. Interestingly, we also noticed cases where JSyn implementations are (insignificantly) shorter. This provides us with another observation regarding the formulation of the problem for k = 0 proofs. In these cases, JSyn proves the existence of viable states, starting from a set of *pre-initial* states, where the contract does not need to hold. This has direct implications to the way that the ∀∃-formulas are constructed in JSyn's underlying machinery, where the assumptions are "baked" into the transition relation, affecting thus the performance of AE-VAL.


**Table 1.** Benchmark statistics.

**Table 2.** Cinderella-Stepmother results.


**Fig. 4.** Experimental results.

One last statistic that we tracked was the performance of the synthesized C implementations in terms of execution time, which can be seen in Fig. 4c. The performance was computed as the mean of 1000000 iterations of executing each implementation using random input values. According to the figure as well as Table 1, the differences are minuscule on average.

Figure 4 does not cover the entirety of the benchmark suite. From the original 124 problems, eleven of them cannot be solved by JSyn's k-inductive approach. Four of these files are variations of the Cinderella-Stepmother game using different representations of the game, as well as two different values for the bucket capacity (2 and 3). Using the variation in Fig. 3 as an input to JSyn, we receive an "unrealizable" answer, with the counterexample shown in Fig. 5. Reading through the feedback provided by JSyn, it is apparent that the underlying SMT solver is incapable of choosing the correct buckets to empty, leading eventually to a state where an overflow occurs for the third bucket. As we already discussed though, a winning strategy exists for the Cinderella game, as long as the bucket capacity C is between 1.5 and 3. This provides an excellent demonstration of the inherent weakness of JSyn for determining unrealizability. JSyn-vg's validityguided approach, is able to prove the realizability for these contracts, as well as synthesize an implementation for each.

Table 2 shows how JSyn-vg performed on the four contracts describing the Cinderella-Stepmother game. We used two different interpretations for the game, and exercised both for the cases where the bucket capacity C is equal to 2 and 3. Regarding the synthesized implementations, their size is analogous to the complexity of the program (Cinderella2 contains more local variables and a helper function to empty buckets). Despite this, the implementation performance remains the same across all implementations. Finally for reference, the table contains the results from the template-based approach followed in Consynth [2]. From the results, it is apparent that providing templates yields better performance for the case of C = 3, but our approach overperforms Consynth when it comes to solving the harder case of C = 2. Finally, the original paper for Consynth also explores the synthesis of winning strategies for Stepmother using the liveness property that a bucket will eventually overflow. While JKind does not natively support liveness properties, we successfully synthesized an implementation for Stepmother using a bounded notion of liveness with counters. We leave an evaluation of this category of specifications for future work.

Overall, JSyn-vg's validity-guided approach provides significant advantages over the k-inductive technique followed in JSyn, and effectively expands JKind's solving capabilities regarding specification realizability. On top of that, it provides an efficient "hands-off" approach that is capable of solving complex games. The most significant contribution, however, is the applicability of this approach, as it is not tied to a specific environment since it can be extended to support more theories, as well as categories of specification.

**Fig. 5.** Spurious counterexample for Cinderella-Stepmother example using JSyn

#### **6 Related Work**

The work presented in this paper is closely related to approaches that attempt to construct infinite-state implementations. Some focus on the continuous interaction of the user with the underlying machinery, either through the use of templates [2,28], or environments where the user attempts to guide the solver by choosing reactions from a collection of different interpretations [26]. In contrast, our approach is completely automatic and does not require human ingenuity to find a solution. Most importantly, the user does not need to be deeply familiar with the problem at hand.

Iterative strengthening of candidate formulas is also used in abductive inference [8] of loop invariants. Their approach generates candidate invariants as maximum universal subsets (MUS) of quantifier-free formulas of the form φ ⇒ ψ. While a MUS may be sufficient to prove validity, it may also mislead the invariant search, so the authors use a backtracking procedure that discovers new subsets while avoiding spurious results. By comparison, in our approach the regions of validity are maximal and therefore backtracking is not required. More importantly, reactive synthesis requires mixed-quantifier formulas, and it requires that inputs are unconstrained (other than by the contract assumptions), so substantial modifications to the MUS algorithm would be necessary to apply the approach of [8] for reactive synthesis.

The concept of synthesizing implementations by discovering fixpoints was mostly inspired by the IC3/PDR [4,9], which was first introduced in the context of verification. Work from Cimatti *et al.* effectively applied this idea for the parameter synthesis in the HyComp model checker [5,6]. Discovering fixpoints to synthesize reactive designs was first extensively covered by Piterman *et al.* [23] who proved that the problem can be solved in cubic time for the class of GR(1) specifications. The algorithm requires the discovery of least fixpoints for the state variables, each one covering a greatest fixpoint of the input variables. If the specification is realizable, the entirety of the input space is covered by the greatest fixpoints. In contrast, our approach computes a single greatest fixpoint over the system's outputs and avoids the partitioning of the input space. As the tools use different notations and support different logical fragments, practical comparisons are not straightforward, and thus are left for the future.

More recently, Preiner *et al*. presented work on model synthesis [24], that employs a counterexample-guided refinement process [25] to construct and check candidate models. Internally, it relies on enumerative learning, a syntax-based technique that enumerates expressions, checks their validity against ground test cases, and proceeds to generalize the expressions by constructing larger ones. In contrast, our approach is syntax-insensitive in terms of generating regions of validity. In general, enumeration techniques such as the one used in ConSynth's underlying E-HSF engine [2] is not an optimal strategy for our class of problems, since the witnesses constructed for the most complex contracts are described by nested if-then-else expressions of depth (i.e. number of branches) 10–20, a point at which space explosion is difficult to handle since the number of candidate solutions is large.

#### **7 Conclusion and Future Work**

We presented a novel and elegant approach towards the synthesis of reactive systems, using only the knowledge provided by the system specification expressed in infinite theories. The main goal is to converge to a fixpoint by iteratively blocking subsets of unsafe states from the problem space. This is achieved through the continuous extraction of regions of validity which hint towards subsets of states that lead to a candidate implementation.

This is the first complete attempt, to the best of our knowledge, on handling valid subsets of a ∀∃-formula to construct a greatest fixpoint on specifications expressed using infinite theories. We were able to prove its effectiveness in practice, by comparing it to an already existing approach that focuses on constructing k-inductive proofs of realizability. We showed how the new algorithm performs better than the k-inductive approach, both in terms of performance as well as the soundness of results. In the future, we would like to extend the applicability of this algorithm to other areas in formal verification, such as invariant generation. Another interesting goal is to make the proposed benchmark collection available to competitions such as SYNTCOMP, by establishing a formal extension for the TLSF format to support infinite-state problems [17]. Finally, a particularly interesting challenge is that of mapping infinite theories to finite counterparts, enabling the synthesis of secure and safe implementations.

**Data Availability Statement.** The datasets generated during and/or analyzed during the current study are available in the figshare repository: https://doi.org/10. 6084/m9.figshare.5904904.v1 [20].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **RVHyper: A Runtime Verification Tool for Temporal Hyperproperties**

Bernd Finkbeiner, Christopher Hahn, Marvin Stenger(B) , and Leander Tentrup

> Reactive Systems Group, Saarland University, Saarbr¨ucken, Germany {finkbeiner,hahn,stenger, tentrup}@react.uni-saarland.de

**Abstract.** We present RVHyper, a runtime verification tool for hyperproperties. Hyperproperties, such as non-interference and observational determinism, relate multiple computation traces with each other. Specifications are given as formulas in the temporal logic HyperLTL, which extends linear-time temporal logic (LTL) with trace quantifiers and trace variables. RVHyper processes execution traces sequentially until a violation of the specification is detected. In this case, a counter example, in the form of a set of traces, is returned. As an example application, we show how RVHyper can be used to detect spurious dependencies in hardware designs.

### **1 Introduction**

*Hyperproperties* [4] generalize trace properties in that they not only check the correctness of *individual* computation traces in isolation, but relate *multiple* computation traces to each other. HyperLTL [3] is a logic for expressing temporal hyperproperties, by extending linear-time temporal logic with *explicit* trace quantification. HyperLTL has been used to specify a variety of informationflow and security properties. Examples include classical properties like noninterference and observational determinism, as well as quantitative informationflow properties, symmetries in hardware designs, and formally verified error correcting codes [8]. While model checking and satisfiability checking tools for HyperLTL already exist [5,8], the *runtime verification* of HyperLTL specifications has so far, despite recent theoretical progress [1,2,7], not been supported by practical tool implementations.

Monitoring hyperproperties is difficult: in principle, the monitor not only needs to process every observed trace, but must also *store* every trace observed so far, so that future traces can be compared with the traces seen so far. On the

This work was partially supported by the German Research Foundation (DFG) as part of the Collaborative Research Center "Methods and Tools for Understanding and Controlling Privacy" (SFB 1223) and by the European Research Council (ERC) Grant OSARES (No. 683300).

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 194–200, 2018. https://doi.org/10.1007/978-3-319-89963-3\_11

other hand, a runtime verification tool for hyperproperties is certainly useful, in particular if the implementation of a security critical system is not available. Even without access to the source code, monitoring the observable execution traces still detects insecure information flow.

In this paper, we present RVHyper, a runtime verification tool for monitoring temporal hyperproperties. RVHyper tackles this challenging problem by implementing two major optimizations: (1) a *trace analysis*, which detects all redundant traces that can be omitted during the monitoring process and (2) a *specification analysis* to detect exploitable properties of a hyperproperty, such as *symmetry*.

We have applied RVHyper in classical information-flow security, such as checking for violations of observational determinism. HyperLTL is, however, not limited to security policies. As an example of such an application beyond security, we show how RVHyper can be used to detect spurious dependencies in hardware designs.

### **2 RVHyper**

In this section we give an overview on the monitoring approach, including the input and output of the monitoring algorithm and the two major optimization techniques implemented in RVHyper.

**Specification.** The input to RVHyper is a HyperLTL specification. HyperLTL [3] is a temporal logic for specifying hyperproperties. The logic extends LTL with quantification over trace variables π and a method to link atomic propositions to specific traces. The set of trace variables is V. Formulas in HyperLTL are given by the grammar

$$\begin{aligned} \varphi &::= \forall \pi.\varphi \mid \exists \pi.\varphi \mid \psi, \text{ and} \\ \psi &::= a\_{\pi} \mid \neg \psi \mid \psi \lor \psi \mid \mathsf{O}\psi \mid \psi \mathcal{U}\psi, \end{aligned}$$

where a ∈ AP and π ∈ V. The finite trace semantics [2] for HyperLTL is based on the finite trace semantics of LTL. In the following, when using L(ϕ) we refer to the finite trace semantics of a HyperLTL formula ϕ. Let t be a finite trace, denotes the empty trace, and |t| denotes the length of a trace. Since we are in a finite trace setting, t[i, . . .] denotes the subsequence from position i to position |t| − 1. Let Π*fin* : V → Σ<sup>∗</sup> be a partial function mapping trace variables to finite traces. We define [0] as the empty set. Π*fin*[i, . . .] denotes the trace assignment that is equal to Π*fin*(π)[i, . . .] for all π. We define a subsequence of t as follows.

$$t[i,j] = \begin{cases} \epsilon & \text{if } i \ge |t|\\ t[i, \min(j, |t|-1)], & \text{otherwise} \end{cases}$$

$$\begin{array}{lll} \Pi\_{fin} \models\_{T} a\_{\pi} & \text{if } a \in \Pi\_{fin}(\pi)[0] \\ \Pi\_{fin} \models\_{T} \neg \varphi & \text{if } \Pi\_{fin} \not\models\_{T} \varphi \\ \Pi\_{fin} \models\_{T} \varphi \vee \psi & \text{if } \Pi\_{fin} \models\_{T} \varphi \text{ or } \Pi\_{fin} \models\_{T} \psi \\ \Pi\_{fin} \models\_{T} \mathsf{O}\varphi & \text{if } \Pi\_{fin}[1, \ldots] \models\_{T} \varphi \\ \Pi\_{fin} \models\_{T} \varphi \downarrow \mathsf{U}\psi & \text{if } \exists i \ge 0, \Pi\_{fin}[i, \ldots] \models\_{T} \psi \wedge \forall 0 \le j < i. \newline \Pi\_{fin}[j, \ldots] \models\_{T} \varphi \\ \Pi\_{fin} \models\_{T} \exists \pi. \varphi & \text{if there is some } t \in T \text{ such that } \Pi\_{fin}[\pi \mapsto t] \models\_{T} \varphi \end{array}$$

**input** : <sup>∀</sup><sup>n</sup> HyperLTL formula <sup>ϕ</sup>, set of traces T, fresh trace t **output**: satisfied or n-ary tuple witnessing violation <sup>M</sup><sup>ϕ</sup> <sup>=</sup> build template(ϕ); **for** *each tuple* <sup>N</sup> <sup>∈</sup> (<sup>T</sup> ∪ {t})<sup>n</sup> **do if** M<sup>ϕ</sup> *accepts* N **then** proceed; **else return** N; **end end return** satisfied; **Algorithm 1.** A high-level sketch of the monitoring algorithm for <sup>∀</sup><sup>n</sup> HyperLTL formulas. **input** : HyperLTL formula ϕ, redundancy free trace set T, fresh trace t **output**: redundancy free set of traces T*min* ⊆ T ∪ {t} <sup>M</sup><sup>ϕ</sup> <sup>=</sup> build template(ϕ) **foreach** t ∈ T **do if** t *dominates* t **then** return T **end end foreach** t ∈ T **do if** t *dominates* t **then** T := T \ {t } **end end return** T ∪ {t}

**Algorithm 2.** Trace analysis algorithm to minimize trace storage.

For example, above mentioned observational determinism can be formalized as the HyperLTL formula ∀π. ∀π .(O<sup>π</sup> = O<sup>π</sup>- ) W (I<sup>π</sup> = I<sup>π</sup>- ), where W is the weak version of U.

**Input and Output.** The input of RVHyper consists of a HyperLTL formula and the observed behavior of the system under consideration. The observed behavior is represented as a trace set T, where each t ∈ T represents a previously observed execution of the system under consideration. If RVHyper detects that the system violates the hyperproperty, it outputs a counter example, i.e, a k-ary tuple of traces, where k is the number of quantifiers in the HyperLTL formula.

**Monitoring Algorithm.** Given a HyperLTL formula ϕ and a trace set T, RVHyper processes a fresh trace under consideration as depicted in Algorithm 1. The algorithm revolves around a *monitor-template* Mϕ, which is constructed from the HyperLTL formula ϕ. The basic idea of the monitor template is that it still contains every trace variables of ϕ, which can be initialized with explicit traces at runtime. This way, the automaton construction of the monitor template is constructed only once as a preprocessing step.

RVHyper initializes the monitor template for each k-ary combination of traces in T ∪ {t}. If one tuple violates the hyperproperty, RVHyper returns that k-ary tuple of traces as a counter example, otherwise RVHyper returns *satisfied*.

**Trace Analysis: Minimizing Trace Storage.** The main obstacle in monitoring hyperproperties is the potentially unbounded space consumption. RVHyper uses a *trace analysis* to detect redundant traces, with respect to a given Hyper-LTL formula, i.e., traces that can be safely discarded without losing any information and without losing the ability to return a counter example.

RVHyper's trace analysis is based on the definition of trace redundancy: we say a fresh trace t is (T,ϕ)-redundant, if T is a model of ϕ if and only if T ∪ {t} is a model of ϕ. The idea, depicted in Algorithm 2, is to check if another trace t contains at least as much informations as t: we say a t dominates t if <sup>π</sup>∈V <sup>L</sup>(Mϕ[<sup>t</sup> /π]) ⊆ L(Mϕ[t/π]). For a fresh incoming trace, RVHyper performs this language inclusion check in both directions in order to compute the minimal trace set that must be stored to monitor the hyperproperty under consideration.

**Specification Analysis: Decreasing Running Time.** RVHyper uses a *specification analysis*, which is a preprocessing step that analyzes the HyperLTL formula under consideration. RVHyper detects whether a formula is (1) *symmetric*, i.e., we halve the number of instantiated monitors, (2) *transitive*, i.e, we reduce the number of instantiated monitors to two, or (3) *reflexive*, i.e., we can omit the self comparison of traces [7].

*Symmetry* is especially interesting because many information flow policies satisfy this property. Consider, for example, observational determinism: ∀π. ∀π .(O<sup>π</sup> = O<sup>π</sup>- ) W (I<sup>π</sup> = I<sup>π</sup>- ). RVHyper detects symmetry by translating this formula to a formula that is unsatisfiable if there exists no pair of traces which violates the symmetry condition: ∃π. ∃π . (O<sup>π</sup> = O<sup>π</sup>- ) W (I<sup>π</sup> = I<sup>π</sup>- ) (O<sup>π</sup>- = Oπ)W (I<sup>π</sup>- = Iπ) . If the resulting formula turns out to be unsatisfiable, RVHyper omits the symmetric instantiations of the monitor automaton, which turns out to be, especially in combination with RVHypers *trace analysis*, a major optimization in practice [7].

**Implementation.** RVHyper<sup>1</sup> is written in C**++**. It uses *spot* for building the deterministic monitor automata and the *Buddy* BDD library for handling symbolic constraints. We use the HyperLTL satisfiability solver EAHyper [5,6] to determine whether the input formula is reflexive, symmetric, or transitive. Depending on those results, we omit redundant tuples in the monitoring algorithm.

### **3 Detecting Spurious Dependencies in Hardware Designs**

While HyperLTL has been applied to a range of domains, including security and information flow properties, we focus in the following on a classical verification problem, the independence of signals in hardware designs. We demonstrate how RVHyper can automatically detect such dependencies from traces generated from hardware designs.

<sup>1</sup> The implementation is available at https://react.uni-saarland.de/tools/rvhyper/.

**Input and Output.** The input to RVHyper is a set of traces where the propositions match the atomic propositions of the HyperLTL formula. For the following experiments, we generate a set of traces from the Verilog description of several example circuits by random simulation. If a set of traces violates the specification, RVHyper returns a counter example.

**Specification.** We consider the problem of detecting whether input signals influence output signals in hardware designs. We write *i o* to denote that the inputs *i* do not influence the outputs *o*. Formally, we specify this property as the following HyperLTL formula:

**Fig. 1.** mux circuit with black box

$$
\forall \pi\_1 \forall \pi\_2 . (\mathbf{o}\_{\pi\_1} = \mathbf{o}\_{\pi\_2}) \; \mathcal{W} \left( \overline{\mathbf{i}}\_{\pi\_1} \neq \overline{\mathbf{i}}\_{\pi\_2} \right),
$$

where *i* denotes all inputs except *i*. Intuitively, the formula asserts that for every two pairs of execution traces (π1, π2) the value of *o* has to be the same until there is a difference between π<sup>1</sup> and π<sup>2</sup> in the input vector *i*, i.e., the inputs on which *o* may depend.

**Sample Hardware Designs.** We apply RVHyper to traces generated from the following hardware designs. Note that, since RVHyper observes traces and treats the system that generates the traces as a black box, the performance of RVHyper does not depend on the size of the circuit.

*Example 1 (* xor*).* As a first example, consider the xor function *o* = *i* ⊕ *i* . In the corresponding circuit, every j-th output bit o<sup>j</sup> is only influenced by the j-the input bits i<sup>j</sup> and i j .

*Example 2 (* mux*).* This example circuit is depicted in Fig. 1. There is a black box combinatorial circuit, guarded by a multiplexer that selects between the two input vectors *i* and *i* and an inverse multiplexer that forwards the output of the black box either towards *o* or *o* . Despite there being a syntactic dependency between *o* and *i* , there is no semantic dependency, i.e., the output *o* does solely depend on *i* and the selector signal.

When using the same example, but with a sequential circuit as black box, there may be information flow from the input vector *i* to the output vector *o* because the state of the latches may depend on it. We construct such a circuit that leaks information about *i* via its internal state.

*Example 3 (*counter*).* Our last example is a binary counter with two input control bits *incr* and *decr* that increments and decrements the counter. The corresponding Verilog design is shown in Fig. 2. The counter has a single output, namely a signal that is set to one when the counter value overflows. Both inputs influence the output, but timing of the overflow depends on the number of counter bits.

```
1 module counter(increase ,
2 decrease , overflow);
3 input increase;
4 input decrease;
5 output overflow;
6
7 reg [2:0] counter;
8
9 assign overflow = (counter
10 == 3'b111 && increase
11 && !decrease);
12
13
14 initial
                                15 begin
                                16 counter = 0;
                                17 end
                                18 always @($global_clock)
                                19 begin
                                20 if (increase && !decrease)
                                21 counter = counter + 1;
                                22 else if (!increase && decrease
                                23 && counter > 0)
                                24 counter = counter - 1;
                                25 else
                                26 counter = counter;
                                27 end
                                28 endmodule
```
**Fig. 2.** Verilog description of Example <sup>3</sup> (counter).

**Table 1.** Results of RVHyper on traces generated from circuit instances. Every instance was run 10 times with different seeds and the average is reported.


**Results.** The results of multiple random simulations are given in Table 1. Despite the high complexity of the monitoring problem, RVHyper is able to scale up to thousands of input traces with millions of monitor instantiations (cf. Algorithm 1). RVHyper's optimizations, i.e., keeping only a minimal set of traces and reducing the number of instances by the specification analysis, are a key factor to those results. For the two instances where the property is satisfied (xor and mux), RVHyper has not found a violation for any of the runs. For instances where the property is violated, RVHyper is able to find counter examples. While counter examples can be found quickly for xor and mux2, the counter instances need more traces since the chance of finding a violating pair of traces is lower.

#### **4 Conclusion**

RVHyper monitors a running system for violations of a HyperLTL specification. The functionality of RVHyper thus complements model checking tools for HyperLTL, like MCHyper [8], and tools for satisfiability checking, like EAHyper [6]. RVHyper is in particular useful during the development of a HyperLTL specification, where it can be used to check the HyperLTL formula on sample traces without the need for a complete model. Based on the feedback of the tool, the user can refine the HyperLTL formula until it captures the intended policy.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Refinement Calculus of Reactive Systems Toolset**

Iulia Dragomir1(B) , Viorel Preoteasa2(B) , and Stavros Tripakis2,3(B)

> <sup>1</sup> Univ. Grenoble Alpes, CNRS, Grenoble INP, VERIMAG, Grenoble, France iulia.dragomir@univ-grenoble-alpes.fr <sup>2</sup> Aalto University, Espoo, Finland <sup>3</sup> University of California, Berkeley, USA

**Abstract.** We present the Refinement Calculus of Reactive Systems Toolset, an environment for compositional modeling and reasoning about reactive systems, built on top of Isabelle, Simulink, and Python.

### **1 Introduction**

The *Refinement Calculus of Reactive Systems* (RCRS) is a compositional framework for modeling and reasoning about reactive systems. RCRS has been inspired by component-based frameworks such as interface automata [3] and has its origins in the theory of relational interfaces [14]. The theory of RCRS has been introduced in [13] and is thoroughly described in [11].

RCRS comes with a publicly available toolset, the *RCRS toolset* (Fig. 1), which consists of:


An extended version of this paper contains an additional six-page appendix describing a demo of the RCRS toolset [6]. The extended paper can also be found in a figshare repository [7]. The figshare repository contains all data (code and models) required to reproduce all results of this paper as well as of [6]: see Section "Data Availability Statement" for more details. The RCRS toolset can be downloaded also from the RCRS web page: http://rcrs.cs.aalto.fi/.

This work has been supported by the Academy of Finland and the U.S. National Science Foundation (awards #1329759 and #1139138).

I. Dragomir—Partially supported by the H2020 Programme SRC ESROCOS and ERGO projects.

Grenoble INP—Institute of Engineering Univ. Grenoble Alpes.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 201–208, 2018. https://doi.org/10.1007/978-3-319-89963-3\_12

**Fig. 1.** The RCRS toolset.

### **2 Modeling Systems in RCRS**

RCRS provides a language of *components* to model systems in a modular fashion. Components can be either *atomic* or *composite*. Here are some examples of atomic RCRS components:

```
definition "Id = [: x -
                     y . y = x :]"
definition "Add = [: (x, y) -
                           z . z = x + y :]"
definition "Constant c = [: x::unit -
                                  y . y = c :]"
definition "UnitDelay = [: (x,s) -
                               (y,s') .y=s ∧ s' = x :]"
definition "SqrRoot = {.x.x ≥ 0 .} o [- x -
                                             √x -]"
definition "NonDetSqrt = {.x.x ≥ 0 .} o [: x -
                                                y.y ≥ 0 :]"
definition "ReceptiveSqrt = [: x -
                               y.x ≥ 0 −→ y = √x :]"
definition "A = {. x . ♦x .} o [: x -
                                      y . ♦y :]"
```
Id models the identity function: it takes input *x* and returns *y* such that *y* = *x*. Add returns the sum of its two inputs. Constant is parameterized by c, takes no input (equivalent to saying that its input variable is of type unit), and returns an output which is always equal to c. UnitDelay is a *stateful* component: s is the current-state variable and s' is the next-state variable. SqrRoot is a *non-input-receptive* component: its input <sup>x</sup> is required to satisfy <sup>x</sup>≥0. (SqrRoot may be considered non-atomic as it is defined as the serial composition of two predicate transformers – see Sect. 3.) NonDetSqrt is a *non-deterministic* version of SqrRoot: it returns an arbitrary (but non-negative) y, and not necessarily the square-root of x. ReceptiveSqrt is an input-receptive version of SqrRoot: it accepts negative inputs, but may return an arbitrary output for such inputs. RCRS also allows to describe components using the temporal logic QLTL, an extension of LTL with quantifiers [11]. An example is component A above. A accepts an infinite input sequence of x's, provided x is infinitely often true, and returns a (non-deterministic) output sequence which satisfies the same property.

Composite components are formed by composing other (atomic or composite) components using three primitive composition operators, as illustrated in Fig. 2: *C* o *C* (in series) connects outputs of *C* to inputs of *C* ; *C* \*\* *C* (in parallel) "stacks" *C* and *C* "on top of each other"; and feedback(*C*) connects the first

**Fig. 2.** The three composition operators of RCRS.

output of *C* to its first input. These operators are sufficient to express any block diagram, as described in Sect. 4.

### **3 The Implementation of RCRS in Isabelle**

RCRS is fully implemented in the Isabelle theorem prover. The RCRS implementation currently consists of 22 Isabelle *theories* (.thy files), totalling 27588 lines of Isabelle code. Some of the main theories are described next.

Theory Refinement.thy (1209 lines) contains a standard implementation of refinement calculus [1]. Systems are modeled as monotonic predicate transformers [4] with a weakest precondition interpretation. Within this theory we implemented non-deterministic and deterministic update statements, assert statements, parallel composition, refinement and other operations, and proved necessary properties of these.

Theory RefinementReactive.thy (1144 lines) extends Reactive.thy to reactive systems by introducing predicates over infinite traces in addition to predicates over values, and *property* transformers in addition to predicate transformers [11,13].

Theory Temporal.thy (788 lines) implements a semantic version of QLTL, where temporal operators are interpreted as predicate transformers. For example, the operator , when applied to the predicate on infinite traces (*x >* 0) : (nat <sup>→</sup> real) <sup>→</sup> bool, returns another predicate on infinite traces (*x >* 0) : (nat <sup>→</sup> real) <sup>→</sup> bool. Temporal operators have been implemented to be polymorphic in the sense that they apply to predicates over an arbitrary number of variables.

Theory Simulink.thy (873 lines) defines a subset of the basic blocks in the Simulink library as RCRS components (at the time of writing, 48 Simulink block types can be handled). In addition to discrete-time, we can handle continuoustime blocks with a fixed-step forward Euler integration scheme. For example, Simulink's integrator block can be defined in two equivalent ways as follows:

```
definition "Integrator dt = [- (x,s) -
                                     (s, s+x*dt) -]"
definition "Integrator dt = [: (x,s) -
                                     (y,s'). y=s ∧ s'=s+x*dt :]"
```
The syntax [- x *f*(x) -] assumes that *f* is a function, whereas [: :] can be used also for relations (i.e., non-deterministic systems). Using the former instead of the latter to describe deterministic systems aids the Analyzer to perform simplifications – see Sect. 5.

Theory SimplifyRCRS.thy (2175 lines) implements several of the Analyzer's procedures. In particular, it contains a simplification procedure which reduces composite RCRS components into atomic ones (see Sect. 5).

In addition to the above, there are several theories containing a proof of correctness of our block-diagram translation strategies (see Sect. 4 and [10]), dealing with Simulink types [12], generating Python simulation code, and many more. A detailed description of all these theories and graphs depicting their dependencies is included in the documentation of the toolset.

The syntax of RCRS components is implemented in Isabelle using a *shallow embedding* [2]. This has the advantage of all datatypes and other mechanisms of Isabelle (e.g., renaming) being available for component specification, but also the disadvantage of not being able to express properties and simplifications of the RCRS language within Isabelle, as discussed in [11]. A *deep embedding*, in which the syntax of components is defined as a datatype of Isabelle, is possible, and is left as an open future work direction.

### **4 The Translator**

The Translator, called simulink2isabelle, translates *hierarchical block diagrams* (HBDs), and in particular Simulink models, into RCRS theories [5]. The Translator (implemented in about 7100 lines of Python code) takes as input a Simulink model (.slx file) and a list of options and generates as output an Isabelle theory (.thy file). The output file contains: (1) the definition of all instances of basic blocks in the Simulink diagram (e.g., all Adders, Integrators, Constants, etc.) as atomic RCRS components; (2) the bottom-up definition of all subdiagrams as composite RCRS components; (3) calls to simplification procedures; and (4) theorems stating that the resulting simplified components are equivalent to the original ones. The .thy file may also contain additional content depending on user options as explained below.

As shown in [5], there are many possible ways to translate a block diagram into an algebra of components with the three primitive composition operators of RCRS. This means that step (2) above is not unique. simulink2isabelle implements the several translation strategies proposed in [5] as user options.

For example, when run on the Simulink diagram of Fig. 3, the Translator produces a file similar to the one shown in Fig. 4. IC Model and FP Model are composite RCRS components generated automatically w.r.t. two different translation strategies, implemented by user options -ic and -fp. The simplify RCRS construct is explained in Sect. 5 that follows.

**Fig. 3.** A Simulink diagram.

Other user options to the Translator include: whether to flatten the input diagram, optional typing information for wires, and whether to generate in addition to the top-level STS component, a QLTL component representing the temporal behavior of the system. The user can also ask the Translator to generate: (1) components w.r.t. all translation strategies; (2) the corresponding theorems showing that these components are all semantically equivalent; and (3) Python simulation scripts for the top-level component.

**Fig. 4.** Auto-generated Isabelle theory for the Simulink diagram of Fig. 3

#### **5 The Analyzer**

The Analyzer is a set of procedures implemented on top of Isabelle and ML, the programming language of Isabelle. These procedures implement a set of functionalities such as simplification, compatibility checking, refinement checking, etc. Here we describe the main functionalities, implemented by the simplify RCRS construct. As illustrated in Fig. 4, the general usage of this construct is simplify RCRS "Model = C" "in" "out", where C is a (generally composite) component and in, out are (tuples of) names for its input and output variables. When such a statement is executed in Isabelle, it performs the following steps: (1) It creates the definition Model = C. (2) It *expands* C, meaning that it replaces all atomic components and all composition operators in C with their definitions. This results in an Isabelle expression E. E is generally a complicated expression, containing formulas with quantifiers, case expressions for tuples, function compositions, and several other operators. (3) simplify RCRS *simplifies* E, by eliminating quantifiers, renaming variables, and performing several other simplifications. The simplified expression, <sup>F</sup>, is of the form {.*p*.} <sup>o</sup> [:*r*:], where *p* is a predicate on input variables and *r* is a relation on input and output variables. That is, F is an atomic RCRS component. (4) simplify RCRS generates a theorem stating that Model is semantically equivalent to F, and also the mechanized proof of this theorem (in Isabelle). Note that the execution by the Analyzer of the .thy file generated by the Translator is fully automatic, despite the fact that Isabelle generally requires human interaction. This is thanks to the fact that the theory generated by the Translator contains all declarations (equalities, rewriting rules, etc.) neccessary for the Analyzer to produce the simplifications and their mechanical proofs, without user interaction.

For example, when the theory in Fig. 4 is executed, the following theorem is generated and proved automatically:

$$\mathsf{Moded} \mathsf{I} = \mathsf{I} \mathsf{\hkern-1.2ex} \mathsf{(\hkern-1.2ex)} \mathsf{(\hkern-1.2ex)} \leadsto \mathsf{(\hkern-1.2ex)} \mathsf{(\hkern-1.2ex)} \mathsf{(\hkern-1.2ex)} \mathsf{(\hkern-1.2ex)}$$

where Model is either IC Model or FP Model. The rightmost expression is the automatically generated simplification of the top-level system to an atomic RCRS component.

If the model contains *incompatibilities*, where for instance the input condition of a block like SqrRoot cannot be guaranteed by the upstream diagram, the toplevel component automatically simplifies to ⊥ (i.e., false). Thus, in this usage scenario, RCRS can be seen as a static analysis and behavioral type checking and inference tool for Simulink.

### **6 Case Study**

We have used the RCRS toolset on several case studies, the most significant of which is a real-world benchmark provided by Toyota [8]. The benchmark consists of a set of Simulink diagrams modeling a Fuel Control System.<sup>1</sup> A typical diagram in the above suite contains 3 levels of hierarchy, 104 Simulink blocks in total (out of which 8 subsystems), and 101 wires (out of which 8 are feedbacks, the most complex composition operator in RCRS). Using the Translator on this diagram results in a .thy file of 1671 lines and 57037 characters. Translation time is negligible. The Analyzer simplifies this model to a top-level atomic STS component with no inputs, 7 (external) outputs and 14 state variables (note that all internal wires have been automatically eliminated in this top-level description). Simplification takes approximately 15 seconds and generates a formula which is 8337 characters long. The formula is consistent (not false), which proves statically that the original Simulink diagram has no incompatibilities. More details about the case study can be found in [5,6].

<sup>1</sup> We downloaded the Simulink models from https://cps-vo.org/group/ARCH/ benchmarks. One of those models is made available in the figshare repository [7] – see also Section "Data Availability Statement".

### **7 Data Availability Statement**

All results mentioned in this paper as well as in the extended version of this paper [6] are fully reproducible using the code, data, and instructions available in the figshare repository: https://doi.org/10.6084/m9.figshare.5900911.v1.

The figshare repository contains the full implementation of the RCRS toolset, including the formalization of RCRS in Isabelle, the Analyzer, the RCRS Simulink library, and the Translator. The figshare repository also contains sample Simulink models, including the Toyota model discussed in Sect. 6, a demo file named RCRS Demo.thy, and detailed step-by-step instructions on how to conduct a demonstration and how to reproduce the results of this paper. Documentation on RCRS is also provided.

The figshare repository provides a snapshot of RCRS as of February 2018. Further developments of RCRS will be reflected on the RCRS web page: http://rcrs. cs.aalto.fi/.

#### **References**


208 I. Dragomir et al.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Static and Dynamic Program Analysis

# **TESTOR: A Modular Tool for On-the-Fly Conformance Test Case Generation**

Lina Marsso(B) , Radu Mateescu, and Wendelin Serwe

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP\*, LIG, 38000 Grenoble, France Lina.Marsso@inria.fr

**Abstract.** We present TESTOR, a tool for on-the-fly conformance test case generation, guided by test purposes. Concretely, given a formal specification of a system and a test purpose, TESTOR automatically generates test cases, which assess using black box testing techniques the conformance to the specification of a system under test. In this context, a test purpose describes the goal states to be reached by the test and enables one to indicate parts of the specification that should be ignored during the testing process. Compared to the existing tool TGV, TESTOR has a more modular architecture, based on generic graph transformation components, is capable of extracting a test case completely on the fly, and enables a more flexible expression of test purposes, taking advantage of the multiway rendezvous. TESTOR has been implemented on top of the CADP verification toolbox, evaluated on three published case-studies and more than 10000 examples taken from the non-regression test suites of CADP.

### **1 Introduction**

Model-Based Testing [7] is a validation technique taking advantage of a model of a system (both, requirements and behavior) to automate the generation of relevant test cases. This technique is suitable for complex industrial systems, such as embedded systems [45] and automotive software [35]. Using formal models for testing is required for certification of safety-critical systems [36]. Conformance testing aims at extracting from a formal model of a system a set of test cases to assess whether an actual implementation of the system under test (SUT) is conform to the model, using black-box testing techniques (i.e., without knowledge of the actual code of the SUT). This approach is particularly suited for nondeterministic concurrent systems, where the behavior of the SUT can be observed and controlled by a tester only via dedicated interfaces, named points of control and observation.

Often, the formal model is an IOLTS (Input/Output Labeled Transition System), where transitions between states of the system are labeled with an action classified as input, output, or internal (i.e., unobservable, usually denoted by τ ).

\* Institute of Engineering Univ. Grenoble Alpes.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 211–228, 2018. https://doi.org/10.1007/978-3-319-89963-3\_13

In this setting, the most prominent conformance relation is input-output conformance (**ioco**) [39,41]. The theory underlying **ioco** is well established, implemented in several tools [1,2,22,25,28], and still actively used, as witnessed by a series of recent case studies [9,10,20,27,38].

As regards asynchronous systems, i.e., systems consisting of concurrent processes with message-passing communication, there exist two different approaches to model-based conformance testing: *coverage-oriented approaches* run the test(s) to stimulate the SUT until a coverage goal has been reached, whereas *test purpose guided approaches* use test suites, each test of which terminates with a verdict (passed, failed, or inconclusive). The generation of tests from the model can be carried out *offline*, before executing them against the SUT, or *online* [28] during their execution, by combining the exploration of the model and the interaction with the SUT.

In this paper, we present TESTOR, a tool for on-the-fly conformance test case generation guided by test purposes, which, following the approach of TGV [25], characterize some state(s) of the model as accepting. The generated test cases are automata that attempt to drive a SUT towards these states. TESTOR extends the algorithms of TGV to extract test cases completely on the fly (i.e., during test case execution against the SUT), making TESTOR suitable for online testing. TESTOR is constructed following a modular architecture based on generic, recent, and optimized graph manipulation components. This also makes the description of test purposes more convenient, by replacing the specific synchronous product of TGV and taking advantage of the multiway rendezvous [18,23], a powerful primitive to express communication and synchronization among a set of distributed processes. TESTOR was built on top of the OPEN/CAESAR [15] generic environment for on-the-fly graph manipulation provided by the CADP [16] verification toolbox.

The remainder of the paper is organized as follows. Section 2 recalls the essential notions of the underlying theory. Section 3 presents the architecture, main algorithms, and implementation of TESTOR, and gives some examples. Section 4 describes various experiments to validate TESTOR and compare it to TGV. Section 5 compares TESTOR to existing test generation approaches. Finally, Sect. 6 gives some concluding remarks and future work directions.

### **2 Background: Essential Definitions of [25]**

Conformance testing checks that a SUT behaves according to a formal reference model (M), which is used as an oracle. We use Input-Output Labelled Transition Systems (IOLTS) [25] to represent the behavior of the model M. We assume that the behavior of the SUT can also be represented as an IOLTS, even if it is unknown (the so-called testing hypothesis [25]). An IOLTS (Q, A, T, q0) consists of a set of states <sup>Q</sup>, a set of actions <sup>A</sup>, a transition relation <sup>T</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>Q</sup>, and an initial state <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup>. The set of actions is partitioned in <sup>A</sup> <sup>=</sup> <sup>A</sup><sup>I</sup> <sup>∪</sup> <sup>A</sup><sup>O</sup> ∪ {τ}, where AI, A<sup>O</sup> are the subsets of input and output actions, and τ is the internal (unobservable) action. A transition (q1, b, q2) <sup>∈</sup> <sup>T</sup> (also noted <sup>q</sup><sup>1</sup> *b* <sup>→</sup> <sup>q</sup>2) indicates

**Fig. 1.** Example of test case selection (taken from [25])

that the system can move from state q<sup>1</sup> to state q<sup>2</sup> by performing action b. Input (resp. output) actions are noted ?a (resp. !a). In the sequel, we consider the same running example as [25], whose IOLTS model M is shown on Fig. 1(a).

Input actions of the SUT are controllable by the environment, whereas output actions are only observable. Testing allows one to observe the execution traces of the SUT, and also to detect *quiescence*, i.e., the presence of deadlocks (states without successors), outputlocks (states without outgoing output actions), or livelocks (cycles of internal actions). The quiescence present in an IOLTS L (either the model M or the SUT) is modeled by a *suspension automaton* Δ(L), an IOLTS obtained from L by adding self-loops labeled by a special output action δ on the quiescent states. The SUT conforms to the model M modulo the **ioco** relation [40] if after executing each trace of Δ(M), the suspension automaton Δ(SUT) exhibits only those outputs and quiescences that are allowed by the model. Since two sequences having the same observable actions (including quiescence) cannot be distinguished, the suspension automaton Δ(M) must be determinized before generating tests.

The test generation technique of TGV is based upon *test purposes*, which allow one to guide the selection of test cases. A test purpose for a model M = (QM, AM, T <sup>M</sup>, q<sup>M</sup> <sup>0</sup> ) is a deterministic and complete (i.e., in each state all actions are accepted) IOLTS TP = (QTP, ATP, T TP, qTP <sup>0</sup> ), with the same actions as the model ATP = A<sup>M</sup>. TP is equipped with two sets of trap states *Accept* TP and *Refuse*TP, which are used to select desired behaviors and to cut the exploration of M, respectively. In the TP shown on Fig. 1b, the desired behavior consists of an action !y followed by !z and is specified by the accepting state q3; notice that the occurrence of an action !z before a !y is forbidden by the refusal state q2. In a TP, a special transition of the form q <sup>∗</sup> <sup>→</sup> <sup>q</sup> is an abbreviation for the complement set of all other outgoing transitions of q. These \*-transitions facilitate the definition of a test purpose (which has to be a complete IOLTS) by avoiding the need to explicitly enumerate all possible actions for all states. Test purposes are used to mark the accepting and refusal states in the IOLTS of the model M. In TGV, this annotation is computed by a synchronous product [25, Definition 8] SP = M × TP. Notice that SP preserves all behaviors of the model M because TP is complete and the synchronous product takes into account the special \* transitions. When computing SP, TGV implicitly adds a self-looping \*-transition to each state of the TP with an incomplete set of outgoing transitions. To keep only the visible behaviors and quiescence, SP is suspended and determinized, leading to SP*vis* = det(Δ(SP)). Figure 1(c) shows an excerpt of SP*vis* limited to the first accepting and refusal states reachable from qSP*vis* <sup>0</sup> .

A *test case* is an IOLTS TC = (QTC, ATC, T TC, qTC <sup>0</sup> ) equipped with three sets of trap states **Pass** <sup>∪</sup> **Fail** <sup>∪</sup> **Inconc** <sup>⊆</sup> <sup>Q</sup>TC denoting verdicts. The actions of TC are partitioned into ATC <sup>I</sup> and <sup>A</sup>TC <sup>O</sup> subsets<sup>1</sup>. A test case TC must be *controllable*, meaning that in every state, no choice is allowed between two inputs or an input and an output (i.e., the tester must either inject a single input to the SUT, or accept all the outputs of the SUT). Intuitively, a TC denotes a set of traces containing visible actions and quiescence that should be executable by the SUT to assess its conformance with the model M and a test purpose TP.From every state of the TC, a verdict must be reachable: **Pass** indicates that TP has been fulfilled, **Fail** indicates that SUT does not conform to M, and **Inconc** indicates that correct behavior has been observed but TP cannot be fulfilled. An example of TC (dark gray states) is shown on Fig. 1(c). Pass verdicts correspond to accepting states (e.g., q11). Inconclusive verdicts correspond either to refusal states (e.g., q<sup>4</sup> or q6) or to states from which no accepting state is reachable (e.g., state q10). Fail verdicts, not displayed on the figure, are reached from every state when the SUT exhibits an output action (or a quiescence) not specified in the TC (e.g., an action !z or a quiescence in state q1).

In general, there are several test cases that can be generated from a given model and test purpose. The union of these test cases forms the Complete Test Graph (CTG), which is an IOLTS having the same characteristics as a TC except for controllability. Figure 1(c) shows the CTG (light and dark gray states) corresponding to M and TP, which is not controllable (e.g., in state q<sup>5</sup> the two input actions ?a and ?b are possible). Formally, a CTG is the subgraph of SP*vis* induced by the states L2A (*lead to accept*) from which an accepting state

<sup>1</sup> In TGV [25], the actions of test cases are symmetric w.r.t. those of the model M and the SUT, i.e., ATC <sup>O</sup> ⊆ A*<sup>M</sup>* <sup>I</sup> (TC emits only inputs of M) and ATC <sup>I</sup> <sup>⊆</sup> <sup>A</sup>SUT <sup>O</sup> ∪ {δ} (TC captures outputs and quiescences of SUT). To avoid confusion, we consider here that inputs and outputs of TC are the same as those of M and SUT.

is reachable, decorated with pass and inconclusive verdicts. A controllable TC exists iff the CTG is not empty, i.e., qSP*vis* <sup>0</sup> ∈ L2A [25].

The execution of a TC against the SUT corresponds to a parallel composition TC || SUT with synchronization on common observable actions, verdicts being determined by the trap states reached by a maximal trace of TC || SUT, i.e., a trace leading to a verdict state. Quiescent livelock states (infinite sequences of internal actions in the SUT) are detected using timers, and lead to inconclusive verdicts. A TC may have cycles, in which case global timers are required to prevent infinite test executions.

### **3 TESTOR**

We present the architecture and implementation of TESTOR, its on-the-fly algorithm for test-case extraction, and show several ways of specifying test purposes.

#### **3.1 Architecture**

TESTOR takes as input a formal model (M), a test purpose (TP), and a predicate specifying the input/output actions of M. Depending on the chosen options, it produces as output either a complete test graph (CTG), or a test case (TC) extracted on the fly. TESTOR has a modular component-based architecture consisting of several on-the-fly IOLTS transformation components, interconnected according to the architecture shown on Fig. 2. The boxes represent transformation components and the arrows between them denote the implicit representations (*post* functions) of IOLTSs.

The first component produces the synchronous product (SP) between the model M and the test purpose TP. Following the conventions of TGV [25], the synchronous product supports \*-transitions and implements the implicit addition of self-looping \*-transitions. The next four reduction components progressively transform SP into SP*vis* = *det*(Δ(SP)) as follows: (i) τ -compression produces the suspension automaton Δ(SP) by squeezing the strongly connected components of τ -transitions and replacing them with δ-loops representing quiescence; (ii) τ -confluence eliminates redundant interleavings by giving priority to confluent τ -transitions, i.e., whose neighbor transitions (going out from the same source state) do not bring new observational behavior; (iii) τ -closure computes the transitive reflexive closure on τ -transitions; (iv) the resulting τ -free IOLTS is determinized by applying the classical subset construction. The reduction by τ -compression is necessary for τ -confluence (which operates on IOLTSs without τ -cycles) and is also useful as a preprocessing step for τ -closure (whose algorithm is simpler in the absence of τ -cycles). Although τ -confluence is optional, it may reduce drastically the size of the IOLTS prior to τ -closure, therefore acting as an accelerator for the whole test selection procedure when SP contains large diamonds of τ -transitions produced by the interleavings of independent actions [31]. The first three reductions [31] are applied only if TESTOR detects the presence of τ -transitions in SP.

**Fig. 2.** Architecture of TESTOR

The determinization produces as output the post function of the IOLTS SP*vis* , whose states correspond to sets of states of the τ -free IOLTS produced by τ -closure. SP*vis* is processed by the explorer component, which builds the CTG or the TC by computing the corresponding subgraph whose states are contained in L2A. The reachability of accepting states is determined on the fly by evaluating the PDL [14] formula <sup>ϕ</sup>*l*2*<sup>a</sup>* <sup>=</sup> true<sup>∗</sup>*accept* on the states visited by the explorer, where the atomic proposition *accept* denotes the accepting states. This check is done by translating the verification problem into a Boolean equation system (BES) and solving it on the fly using a BES solver component [32]. The synchronous product and the explorer are the only components newly developed, all the other ones (represented in gray on Fig. 2) being already available in the libraries of the OPEN/CAESAR [15] environment of CADP.

#### **3.2 On-the-Fly Test Selection Algorithm**

We describe below the algorithm used by the explorer component to extract the CTG or a (controllable) TC from the SP*vis* IOLTS on the fly.

Basically, the CTG is the subgraph of SP*vis* containing all states in L2A, extended with some states denoting verdicts. The accepting states (which are by definition part of L2A) correspond to pass verdicts. For every state <sup>q</sup> <sup>∈</sup> L2A, the output transitions q !*a* <sup>→</sup> <sup>q</sup> with <sup>q</sup> ∈ L2A lead to inconclusive verdicts, and the output transitions other than those contained in SP*vis* lead to fail verdicts. To compute the CTG, the explorer component performs a forward traversal of SP*vis* and keeps the states <sup>q</sup> <sup>∈</sup> L2A, which satisfy the formula <sup>ϕ</sup>*l*2*a*. The check <sup>q</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*l*2*<sup>a</sup>* is done by solving the variable <sup>X</sup>*<sup>q</sup>* of the minimal fixed point BES {X*<sup>q</sup>* = (<sup>q</sup> <sup>|</sup><sup>=</sup> *accept*) <sup>∨</sup> - *q b* →*q*- X*<sup>q</sup>*-} denoting the interpretation of <sup>ϕ</sup>*l*2*<sup>a</sup>* on SP*vis* . The resolution is carried out on the fly using the algorithm for disjunctive BESs proposed in [32]. If the CTG is not empty (i.e., qSP*vis* <sup>0</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>*l*2*a*), then it contains at least one controllable TC [25].

The extraction of a TC uses a similar forward traversal as for generating the CTG, extended to ensure controllability, i.e., every state q of TC either has only one outgoing input transition q ?*a* <sup>→</sup> <sup>q</sup> with <sup>q</sup> <sup>∈</sup> L2A, or has all output transitions q !*a* <sup>→</sup> <sup>q</sup> of SP*vis* with <sup>q</sup> <sup>∈</sup> L2A. The essential ingredient for selecting the input transitions on the fly is the diagnostic generation for BESs [30], which provides, in addition to the Boolean value of a variable, also the minimal fragment (w.r.t. inclusion) of the BES illustrating the value of that variable. For a variable X*<sup>q</sup>* evaluated to true in the disjunctive BES underlying ϕ*l*2*a*, the diagnostic (witness) is a sequence <sup>X</sup>*<sup>q</sup>* <sup>→</sup> *<sup>b</sup>*<sup>1</sup> <sup>X</sup>*q*<sup>1</sup> → ··· *<sup>b</sup>*<sup>2</sup> <sup>→</sup> *<sup>b</sup><sup>k</sup>* <sup>X</sup>*q<sup>k</sup>* where <sup>q</sup>*<sup>k</sup>* <sup>|</sup><sup>=</sup> *accept*. This induces a sequence of transitions <sup>q</sup> <sup>→</sup> *<sup>b</sup>*<sup>1</sup> <sup>q</sup><sup>1</sup> → ··· *<sup>b</sup>*<sup>2</sup> <sup>→</sup> *<sup>b</sup><sup>k</sup>* <sup>q</sup>*<sup>k</sup>* in SP*vis* leading to an accepting state. Since all states q, q1, ..., q*<sup>k</sup>* also belong to L2A, this diagnostic sequence is naturally part of the TC under construction.

More precisely, the TC extraction algorithm works as follows. If qSP*vis* <sup>0</sup> |= ϕ*l*2*a*, the diagnostic sequence for qSP*vis* <sup>0</sup> is inserted in the TC (otherwise the algorithm stops because the CTG is empty). For the TC illustrated on Fig. 1(c), this first diagnostic sequence is q<sup>0</sup> ?a <sup>→</sup> <sup>q</sup><sup>1</sup> !y <sup>→</sup> <sup>q</sup><sup>5</sup> ?b <sup>→</sup> <sup>q</sup><sup>9</sup> !z <sup>→</sup> <sup>q</sup>11. Then, the main loop consists in choosing an unexplored transition of the TC and processing it.


The insertion of a diagnostic sequence in the TC stops when it meets a state q that already belongs to the TC, since by construction the TC already contains a sequence starting at q and leading to an accepting state. This is the case, e.g., for the diagnostic sequence starting at state q<sup>5</sup> in the TC on Fig. 1(c). In this way, the TC is built progressively by inserting the diagnostic sequences produced for each of the encountered states in L2A.

During the forward traversal of SP*vis* , the explorer component continuously interacts with the BES solver, which in turn triggers other forward explorations of SP*vis* to evaluate ϕ*l*2*a*. The repeated invocations of the solver have a cumulated linear complexity in the size of the BES (and hence, the size of SP*vis* ), because the BES solver keeps its context in memory and does not recompute already solved Boolean variables [32].

#### **3.3 Implementation**

TESTOR is built upon the generic libraries of the OPEN/CAESAR [15] environment, in particular the on-the-fly reductions by τ -compression, τ -confluence and τ -closure [31], and the on-the-fly BES resolution [32]. The tool (available at http://convecs.inria.fr/software/testor) consists of 5022 lines of C and 1106 lines of shell script.

### **3.4 Examples of Different Ways to Express a Test Purpose**

Consider an asynchronous implementation of the DES (Data Encryption Standard) [37]. In a nutshell, the DES is a block-cipher taking three inputs: a Boolean indicating whether encryption or decryption is requested, a 64-bit key, and a 64-bit block of data. For each triple of inputs, the DES computes the 64-bit (de)crypted data, performing sixteen iterations of the same cipher function, each iteration with a different 48-bit subkey extracted from the 64-bit key.

A natural TP for the DES is to search for a sequence corresponding to the encryption of a single data block, for instance 0x0123456789abcdef with key 0x133457799bbcdff1, the expected result of which is 0x85e813540f0ab405. Using the LNT language [8,17], one would be tempted to write this TP as the process PURPOSE1, simply containing the desired sequence of three inputs (on gates CRYPT, KEY, and DATA) followed by an output (on gate OUTPUT):

```
process PURPOSE1 [CRYPT: CB, KEY, DATA, OUTPUT: C64, T_ACCEPT: none] is
  CRYPT (true); −− input
  KEY (C_13345779_9bbcdff1); −− input
  DATA (C_01234567_89abcdef); −− input
  OUTPUT (C_85e81354_0f0ab405); −− output
  loop T_ACCEPT end loop
end process
```
Following the conventions of TGV, we mark accepting (respectively, refusal) states by a self-loop labeled with T\_ACCEPT (respectively, T\_REFUSE).

However, PURPOSE1 is not complete: e.g., initially only one action out of the possible set {CRYPT (**true**), CRYPT (**false**), KEY (C\_13345779\_9bbcdff1), ...} is specified. Thus, when computing the synchronous product with the model, PURPOSE1 is implicitly completed by self-loops labeled with "\*" (as in the TP shown on Fig. 1b), yielding a significantly more complex TC than expected. For instance, the implicit \*-transition in the initial state allows the tester to perform the sequence "CRYPT (**false**)**;** CRYPT (**true**)" rather than the expected first action "CRYPT (**true**)". To force the generation of a TC corresponding to the simple sequence, it is necessary to explicitly complete the TP with transitions to refusal states, as shown by the LNT process PURPOSE2, where gate OTHERWISE stands for the special label "\*":

```
process PURPOSE2 [CRYPT: CB, KEY, DATA, OUTPUT: C64, SUBKEY: C48,
                  T_ACCEPT, T_REFUSE, OTHERWISE: none] is
  select −− refuse any rendezvous but ''CRYPT (TRUE)"
     CRYPT (true)
  [] OTHERWISE; loop T_REFUSE end loop
  end select;
  select −− refuse any rendezvous but ''KEY (C 13345779 9BBCDFF1)"
```

```
KEY (C_13345779_9BBCDFF1)
[] OTHERWISE; loop T_REFUSE end loop
end select;
loop L in
  select −− refuse any rendezvous but on gates DATA and SUBKEY
     DATA (C_01234567_89ABCDEF); break L
  [] SUBKEY (?any BIT48)
  [] OTHERWISE; loop T_REFUSE end loop
  end select
end loop;
loop −− refuse any rendezvous but on gates OUTPUT and SUBKEY
  select −− test target is reached by a rendezvous on OUTPUT
     OUTPUT (C_85E81354_0F0AB405); loop T_ACCEPT end loop
  [] SUBKEY (?any BIT48)
  [] OTHERWISE; loop T_REFUSE end loop
  end select
end loop
```
**end process** Instead of using the dedicated synchronous product, it is also possible to take advantage of the multiway rendezvous [18,23] to compositionally annotate the model, relying on the LNT operational semantics [8, Appendix B] to cut undesired branches. For instance, the same effect as the synchronous product with PURPOSE2 can be obtained by skipping the left-most component "synchronous product" of Fig. 2, i.e., feeding the τ -reduction steps with the IOLTS described by the following LNT parallel composition:

```
par CRYPT, KEY, DATA, OUTPUT in
  DES [CRYPT, KEY, DATA, OUTPUT, SUBKEY]
end par
```
This approach based on the multiway rendezvous even supports data handling. For instance, to observe the data (variable D), key (variable K), and whether an encryption or decryption is requested (variable C), and to verify the correctness of the result (in the rendezvous "OUTPUT (DES (C, K, D))", DES denotes a function implementing the DES algorithm), one has just to replace in the above parallel composition the call to PURPOSE1 by a call to the process PURPOSE3:

```
process PURPOSE3 [CRYPT: CB, KEY, DATA, OUTPUT: C64, T_ACCEPT: none] is
  var C: BOOL, D, K: BIT64 in
     CRYPT (?C);
     KEY (?K);
     DATA (?D);
     OUTPUT (DES (C, K, D));
     loop T_ACCEPT end loop
  end var
end process
```
### **4 Experimental Evaluation**

TESTOR follows TGV's implementation of the **ioco**-based testing theory [39, 41], using the same IOLTS processing steps, adding only the τ -confluence reduction. For each step, TESTOR uses components developed, tested, and used in other tools for more than a decade. In this section, we focus on performance aspects and we compare TESTOR to TGV. For this purpose, we conducted several experiments with models and test purposes, both automatically generated and drawn from academic examples and realistic case studies.

For assessing the correctness of TESTOR, we checked that each TC is included in the CTG, and we compared the TCs and CTGs generated by TESTOR to those generated by TGV. The latter comparison required several additional steps, automated using shell scripts and a dedicated tool (about 300 lines of C code). First, we generated the LTS of each TP, applying appropriate renamings, because TGV expects the TP to be an explicit LTS, with accepting (resp. refusing) states marked by a self-looping transition labeled with ACCEPT (resp. REFUSE), and with the label "\*". Then, we modified the TC and CTG generated by TESTOR so that each label includes the information whether the label is an input or output, and which verdict state (if any) is reached by the corresponding transition. Using this approach, we found that the CTGs generated by both tools were strongly bisimilar. The same does not hold for all the TCs, because the tools may ensure controllability in different ways, leading to non-bisimilar, but correct TCs.

For each pair of model and TP, we measured the runtime and peak memory usage of computing a TC or CTG (using TESTOR and TGV), excluding the fixed cost of compiling the LNT code (model and TP) and generating the executable. The experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). Concretely, we used the petitprince cluster located in Luxembourg, consisting of sixteen machines, each equipped with 2 Intel Xeon E5-2630L CPUs, 32 GB RAM, and running 64-bit Debian GNU/Linux 8 and CADP 2017-i. Each measurement corresponds to the average of ten executions.

#### **4.1 Test Purposes Taken from Case Studies**

Table 1 summarizes the results for some selected examples. The first two have been kindly provided by Alexander Graf-Brill, and correspond to initial versions of TPs for his EnergyBus model [20]; both aim at exhibiting a particular boot sequence, the second one using REFUSE transitions. The next four examples have been used by STMicroelectronics to verify a cache-coherence protocol [27]. The last three correspond to the three TPs presented in Sect. 3.4 and check the


**Table 1.** Run-time performance for selected examples

Execution time is given in seconds and memory usage in MB.

correctness of a simplified<sup>2</sup> version of the asynchronous implementation of the DES (Data Encryption Standard) [37]. These examples cover a large spectrum of characteristics: from no τ -transitions (ACE) to huge confluent τ -components (DES), from few visible transitions (DES) to many outgoing visible transitions (EnergyBus), and a test selection more or less guided via refusal states.

We observe that TESTOR requires less memory than TGV for all examples, but most significantly for the DES. However, although TESTOR is several orders of magnitude slower than TGV for the DES when using the synchronous product (TPs PURPOSE1 and PURPOSE2), TESTOR requires only two seconds to generate a TC or CTG when using an LNT parallel composition with the TP with data handling PURPOSE3. This is because the LNT parallel composition, handled by the LNT compiler, enables more aggressive optimizations. Thus, using LNT parallel composition to annotate the model's accepting and refusal states is not only more convenient (thanks to the multiway rendezvous) and data aware, but also much more efficient — it is even possible to generate a TC for the original DES model (167 million states, 1.5 billion transitions) in less than 40 min.

For the ACE examples, TESTOR is both faster and requires less memory than TGV. This is partly due to an optimization of TESTOR, which deactivates the various reductions of τ -transitions. For a fair comparison, we also run experiments forcing the execution of these reductions. For the extraction of a TC, this increases the execution time by a factor of two and the memory requirements by a factor of three. For the computation of a CTG, this increases the memory requirements by a factor of one and a half, without modifying the execution time significantly.

<sup>2</sup> The S-boxes are executed sequentially rather than in parallel and the gate SUBKEY is left visible to separate the iterations of the DES algorithm and thus significantly reduce the size of τ -components. For the extraction of TC for PURPOSE2 from the full version of the DES, TESTOR would run for several weeks and TGV would require more than 700 GB of RAM.

#### **4.2 Automatically Generated Test Purposes**

To evaluate the performance, we used a collection of 9791 LTSs with up to 50 million transitions, taken from the non-regression test-base for CADP. For each LTS M of the collection, we automatically generated two TPs: one to test the reachability of an action and another to test the presence of an execution sequence. For the former TP, we sorted the actions of the LTS alphabetically, and checked the reachability of the first action, considering the second half of the action set as inputs. For the latter TP, we used the EXECUTOR tool<sup>3</sup> to extract a sequence of up to 1000 visible actions, which we transformed into a TP, considering all actions whose ranking is an odd number as inputs. Technically, this transformation consists in adding to each state of the sequence a self-loop labeled with τ and a \*-transition to a refusal state.

From the generated pairs (M, TP) we eliminated those for which the automatic generation of a TP failed (for instance, due to special actions that would require particular treatment) and those for which the computation of a TC or CTG took too much time or required too much memory by either TESTOR or TGV. This led to a collection of 13,142 pairs (M, TP) for which both tools could extract a TC. For 12,654 of them, both tools also could compute the CTG. Figure 3 displays the results for each example, using logarithmic scales for both execution time and memory requirements, to make the differences for small values more visible.

As for the case studies, we observe that TESTOR and TGV choose different tradeoffs between computation time and memory requirements. On average, TESTOR requires 0.3 times less memory and runs 1.3 (respectively 0.5) times faster to compute a TC (respectively the CTG). When considering only the 1005 pairs with more than 500,000 transitions in the LTS, the average numbers show a larger difference. On average for these larger examples, to compute a CTG, TESTOR requires 1.4 times less memory, but runs 3.5 times longer; to compute a TC, TESTOR requires 2.7 times less memory and runs 0.7 times faster.

Also, while both tools required the exclusion of examples due to excessive runtime, we excluded several examples due to insufficient memory for TGV, but not for TESTOR. Given that TCs are usually much smaller than CTGs, the on-the-fly extraction of a TC by TESTOR is generally faster and consumes less memory than the generation of the CTG. We also observed that the CTGs produced by TESTOR are sometimes smaller than (although strongly bisimilar to) those produced by TGV.

While trying to understand these results in more detail, we found examples where each tool is one or two magnitudes faster or memory-efficient than the other. Indeed, the benefits of the different reductions applied in the tools depend heavily on the characteristics of the example, most notably the sizes of the various subgraphs explored (τ -components, L2A). For instance, when the model M does not contain any τ -transition, there is no point in applying the reductions (τ -compression, τ -confluence, and τ -closure).

<sup>3</sup> http://cadp.inria.fr/man/executor.html.

**Fig. 3.** Compared performance of TESTOR and TGV

The modular architecture of TESTOR enabled us to easily experiment with variants of the algorithm used for solving the BES underlying ϕ*l*2*a*. By default, when extracting a TC on the fly, we use the depth-first search (DFS) algorithm, which for disjunctive BESs stores only variables and not their dependencies (and hence only the states, and not the transitions of the model). Using the breadth-first search (BFS) algorithm of the solver produces smaller TCs, because it generates the shortest diagnostic sequences for states in L2A. However, this comes at the price of an increased execution time and memory consumption, a known phenomenon regarding BFS versus DFS algorithms [32]. Thus, one can choose between BFS or DFS resolution if the size of the TC extracted on the fly is judged more important or not than the resources required to compute it.

#### **5 Related Work**

Although model-based conformance testing has been intensively studied, there are only a few tools that use variants of the **ioco** conformance relation and that are still actively developed [4]. Other model-based tools for combinatorial and statistical testing, or white box testing are described in [43]. In the following, we compare TESTOR to the most closely related tools.

TorX [42] and JTorX [2] are online test generation tools, equipped with a set of adapters to connect the tester to the SUT. The latest versions support test purposes (TPs), but they are used differently than in TESTOR. Indeed, JTorX yields a two-dimensional verdict [3]: one dimension is the **ioco** correctness verdict (pass or fail), and the other dimension is an indication whether the test objective has been reached. This contrasts with TESTOR, which generates test cases (TCs) ensuring by construction that the execution stays inside the lead to accept states (L2A), and stopping the test execution as soon as possible with a verdict: **fail** if non-conformance has been detected, **pass** if an accepting state has been reached, or **inconclusive** if leaving L2A is unavoidable.

Uppaal is a toolbox for the analysis of timed systems, modeled as timed automata extended with data. Three test generation tools exist for Uppaal timed automata. Uppaal-Tron [28] is an online test generation tool, taking as input a specification and an environment model, used to constrain the test generation. Uppaal-Tron is also equipped with a set of adapters to derive and execute the generated tests on the SUT. Contrary to TESTOR, the TCs generated from Uppaal-Tron can be irrelevant, because the generation is not guided by TPs. Uppaal-Cover [22] generates offline a comprehensive test suite from a deterministic Uppaal model and coverage criteria specified by observer automata. Uppaal-Cover attempts to build small test suite satisfying the coverage criteria, by selecting those TCs satisfying the largest parts of the coverage criteria. In contrast to TESTOR and Uppaal-Tron, Uppaal-Cover generates offline tests. Offline generation does not face the state-space explosion, but also limits the expressiveness of the specification language (e.g, nondeterministic models are not allowed). Uppaal-Yggdrasil [26] generates offline test suites for deterministic Uppaal models, using a three-step strategy to achieve good coverage: (i) a set of reachability formulas, (ii) random execution, and (iii) structural coverage of the transitions in the model. The guidance of the test generation by a temporal logic formula is similar to the use of a TP. However, the TPs supported by TESTOR (and TGV) can express more complex properties than reachability, and enable one to control the explored part of the model (using refusal states).

On-the-fly test generation tools also exist for the synchronous dataflow language Lustre [21], e.g., Lutess [12], Lurette [24], and Gatel [29]. Contrary to TESTOR, these tools do not check the **ioco** relation, but randomly select TCs, satisfying constraints of an environment description and an oracle.

In IOLTS, actions are monolithic, which does not fit for realistic models that involve data handling. STG (Symbolic Test Generator) [11] breaks the monolithic structure of actions, enabling access to the data values, and generates tests on the fly, handling data values symbolically. This enables more user-friendly TPs and more abstract TCs, because not all possible values have to be enumerated. However, the complexity of symbolic computation is not negligible in practice. When using the LNT parallel composition, TESTOR can handle data (see example in Sect. 3.4) without the cost of symbolic computation, but still has to enumerate data explicitly when generating the TC. T-Uppaal [34] uses symbolic reachability analysis to generate tests on the fly and then simultaneously executes them on the SUT. The complexity of symbolic algorithms turns out to be expensive for online testing.

When executing a generated TC against a SUT, it is necessary to refine it to take into account the asynchronous communication between the SUT and the tester. Actually, the SUT accepts every input at any time, whereas the TC is deterministic, i.e., there is no choice between an input and an output. An approach for connecting a TC (randomly selected) and an asynchronous SUT was defined in [44]. A similar approach using TPs to guide the test generation was proposed in [5] and subsequently extended to timed automata [6]. Recently, this kind of connection was automated by the MOTEST tool [19].

### **6 Conclusion**

We presented TESTOR, a new tool for on-the-fly conformance test case generation for asynchronous concurrent systems. Like the existing tool TGV, TESTOR was developed on top of the CADP toolbox [16] and brings several enhancements: online testing by generating (controllable) test cases completely on the fly; a more versatile description of test purposes using the LNT language; and a modular architecture involving generic graph manipulation components from the OPEN/- CAESAR environment [15]. The modularity of TESTOR simplifies maintenance and fine-tuning of graph manipulation components, e.g., by adding or removing on-the-fly reductions, or by replacing the synchronous product. Besides the ability to perform online testing, the on-the-fly test selection algorithm sometimes makes possible the extraction of test cases even when the generation of the complete test graph (CTG) is infeasible.

The experiments we carried out on ten-thousands of benchmark examples and three industrial case studies show that TESTOR consumes less memory than TGV, which in turn is sometimes faster, for generating CTGs. We plan to experiment with state space caching techniques [33] and with other on-the-fly reductions to accelerate CTG generation in TESTOR. We also plan to investigate how to facilitate the description of test purposes, by deriving them from the action-based, branching-time temporal properties of the model (following the results of [13] in the state-based, linear-time setting) or by synthesizing them according to behavioral coverage criteria.

**Acknowledgements.** We are grateful to Alexander Graf-Brill and Holger Hermanns for providing us with the model and test purposes of their EnergyBus case study. We also thank Hubert Garavel for helpful remarks about the paper. This work was supported by the R´egion Auvergne-Rhˆone-Alpes within the program ARC 6.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimal Dynamic Partial Order Reduction with Observers**

Stavros Aronis , Bengt Jonsson , Magnus L˚ang(B) , and Konstantinos Sagonas

Department of Information Technology, Uppsala University, Uppsala, Sweden *{*stavros.aronis,bengt.jonsson, magnus.lang,konstantinos.sagonas*}*@it.uu.se

**Abstract.** Dynamic partial order reduction (DPOR) algorithms are used in stateless model checking (SMC) to combat the combinatorial explosion in the number of schedulings that need to be explored to guarantee soundness. The most effective of them, the Optimal DPOR algorithm, is optimal in the sense that it explores only one scheduling per Mazurkiewicz trace. In this paper, we enhance DPOR with the notion of *observability*, which makes dependencies between operations conditional on the existence of future operations, called *observers*. Observers naturally lead to a lazy construction of dependencies. This requires significant changes in the core of POR algorithms (and Optimal DPOR in particular), but also makes the resulting algorithm, Optimal DPOR with Observers, super-optimal in the sense that it explores exponentially less schedulings than Mazurkiewicz traces in some cases. We argue that observers come naturally in many concurrency models, and demonstrate the performance benefits that Optimal DPOR with Observers achieves in both an SMC tool for shared memory concurrency and a tool for concurrency via message passing, using both synthetic and actual programs as benchmarks.

### **1 Introduction**

Testing and verification of concurrent programs is hard, as it requires reasoning about all the ways in which operations executed by different processes (or threads) can interfere. *Stateless model checking (SMC)* [12] is a technique with low memory requirements that can be effective in finding concurrency errors or proving that a program cannot reach an error state by systematically exploring all the ways in which such operations can be interleaved. The technique requires taking control of the scheduler and subsequently executing the program multiple times, each time imposing a different scheduling of the processes. By considering every process at every execution step, however, the number of possible schedulings grows exponentially w.r.t. the total length of program execution. *Partial order reduction (POR)* techniques [9,11,20,22] address this problem by prescribing the exploration of only a subset of schedulings, albeit a subset that is sufficient to cover all behaviours. POR techniques take advantage of the fact that most pairs of operations by different processes in typical concurrent programs are not interfering. As a result, a scheduling E that can be obtained from another scheduling E by swapping adjacent but non-interfering (independent) execution steps will make the program behave in exactly the same way as E ; such schedulings have the same partial order of interfering operations and belong to the same equivalence class, called a *Mazurkiewicz trace* [19]. It is sufficient for SMC algorithms to explore only one scheduling in each such equivalence class.

POR algorithms operate by examining pairs of interfering operations. If it is possible to execute such operations in the reverse order, then their partial order will be different, and a scheduling from the relevant equivalence class must also be explored. For soundness, POR techniques need to be conservative, treating operations as interfering even in cases where they are not. Increasing the accuracy of interference detection can therefore significantly improve the effectiveness of any POR technique. In early POR techniques, interference was determined statically, leading to over-approximations and limiting the achievable reduction. The efficiency of POR was later increased using semantic information to decide which operations interfere [13]. *Dynamic Partial Order Reduction* (DPOR) [10] further improved the effectiveness of POR algorithms by allowing interference to be determined from data obtained during the program's execution.

In this paper, we introduce the notion of *observability* of operations, allowing *observer* operations that appear later in a scheduling to be used when deciding whether earlier operations are interfering. We start by explaining observers with a series of examples (Sect. 2), and continue by presenting key notions of DPOR and explaining why using observers in DPOR algorithms is challenging (Sect. 3). We then present a formal framework (Sect. 4) and describe an extension to the Optimal DPOR algorithm [2] that enables use of observers (Sect. 5). The extension is generic in the sense that it can be applied to several models of concurrency, such as shared memory and message passing. We demonstrate this claim by two implementations: one in an SMC tool for C/C++ programs with pthreads and one in an SMC tool for Erlang programs (Sect. 6). Finally, in Sect. 7 we evaluate our implementations and show that Optimal DPOR with Observers can achieve significantly better reduction in both synthetic and 'real' programs.

### **2 DPOR and Observers by Example**

Consider the program shown in Fig. 1 in which a main process spawns two concurrent processes, p and q, which issue write operations on two different shared variables x and y. After p and q finish their execution, the main process reads the values of x and y and checks a correctness property. A DPOR algorithm will begin exploring this program by executing an arbitrary scheduling; see Fig. 1 (middle). Nodes show the values of the shared variables and each transition consists of an execution step. By inspecting the operations in this scheduling, the algorithm sees that if the second step of q is scheduled before the second step of p, the partial order of the writes to the y variable is different. It therefore

**Fig. 1.** Writers program (its correctness property as assertion) and two of its schedulings.

plans to execute a scheduling in which the second step of p happens after the one from q. The start of this scheduling can be denoted as p.q. Similarly, the order of the writes on x can be reversed, by executing q's first step before the first step of p. Therefore, a scheduling starting with q should also be explored. In Optimal DPOR [2], future explorations are added as partial schedulings, forming *wakeup trees* (shown in blue). These trees are quite trivial in this example, each consisting of a single path.

The algorithm continues exploration from the "deepest" point where a new scheduling should be tried; in the example, this is the (1,0) node. A second scheduling is explored with the intention to execute some operation before the second step of p. Without any other constraint, a non-optimal DPOR algorithm could execute p's second step immediately after the first step of q, ending up in a state identical with the previously explored (2,1) and then again in (2,2). The *sleep sets* technique [11] can be used to avoid or stop such redundant explorations. Sleep sets retain information from already explored earlier process steps that have not yet been 'overtaken' by some step in the current exploration. In our example, information about p's second step is retained in the sleep set until some other interfering operation (here q's second step) has been executed. Moreover, sleep sets can be used to infer that swapping (again) the second step of p and the second step of q (based on their interference in the second scheduling) is redundant. Any DPOR algorithm using sleep sets will explore four schedulings for this program (instead of the six ones possible). Each of these four schedulings leads to a different final state. Notice that two writes on the same variable were always deemed as interfering.

Consider now the program shown on the right. The shared variable x (whose initial value is 0) is accessed by processes

$$\begin{array}{rcl} p & q & r \\ \mathbf{x} & \coloneqq \mathbf{1} & \parallel \mathbf{x} & \coloneqq \mathbf{2} & \parallel \mathbf{assert} \mathbf{(x<3)} \end{array}$$

p, q and r. Here, the correctness property is checked by process r. If interference is decided using the same criteria as a *data race* (i.e., two operations interfere if they access the same memory location and at least one of them is a write), then all three operations interfere with each other. As a result, each of the 3! = 6 possible interleavings has a different partial order and therefore belongs to a different Mazurkiewicz trace that should be explored by a DPOR algorithm. In schedulings starting with r, however, the order of the execution of p and q is irrelevant (if one does not care about the final contents of the memory), as the values written by these operations will never be read. A DPOR algorithm could detect that the written values are *not observed* and consider the write operations as non-interfering.

Taking this idea further, consider a next example, shown on the right. Here, N processes write on the shared variable x, and as a result there exist N! schedulings. In each such scheduling, however, only the last writ-

$$\begin{array}{lcl}p\_1 & p\_2 & \dots & p\_N \\ \mathbf{x} & \mathbf{=1} & \parallel \mathbf{x} & \mathbf{=2} & \parallel & \mathbf{x} & \mathbf{=} & N \\ & & & & \text{join processes } p\_1, p\_2, \dots, p\_N; \\ & & & & \mathbf{assert} \left(\mathbf{x} \rhd \mathbf{g}\right) \end{array}$$

ten value will be read. A DPOR algorithm could consider write operations that are not subsequently observed as independent and therefore explore just N instead of N! schedulings, thereby achieving an exponential reduction.

In the last two examples, better reduction could be obtained if the interference of write operations, which are conservatively considered as "always interfering", was characterized more accurately by looking at complete executions and taking observability by "future" operations into account. This idea is applicable not only in shared memory but also in other models of concurrency. In the next message passing program, processes p and q each send a different message to the mailbox of process r using the send operator "!". Process r uses a receive operation to retrieve a message and store it in a (local) variable x. If we assume that receive operations pick and

*pq r r* ! M<sup>1</sup> *r* ! M<sup>2</sup> receive x return the oldest message in the mailbox or return null if no message exists, send opera-

tions can interfere (the order of delivery is significant) and so can send and receive operations (an empty mailbox can yield a different value). As a result, six schedulings are possible. However, only three schedulings need to really be explored: the receive operation interferes only with the earliest send operation and cannot be affected by a later send; moreover, if the receive operation is executed first, the order of the send operations is irrelevant.

If we instead assume that receive operations *block* if no matching message exists, only *two* schedulings need to be explored, as r can receive either M<sup>1</sup> or M2. Again, if we generalize the example to N processes instead of just two, the behaviour is similar to the program with N writes: only N schedulings (instead of N!) are relevant, each determined by the first message delivered; the remaining message deliveries are not observable. Note that, in this concurrency model, we are interested in the observability of the *first* instead of the last operation in an execution sequence.

In some message-passing concurrency models (e.g., in Erlang programs [4]), it is further possible to use *selective* receive operations instead, which also block when no message can be selected. Using this feature, the previous program can be generalized and rewritten so that r is explicitly picking messages in order, using pattern matching. Such a program is shown on the right. Here r wants to pick up the N messages in order: first M1, then M2, etc. Thus, the order of delivery of messages is irrelevant. A DPOR algorithm could take advantage of the additional information provided by the selective receive operations, notice that the messages from

p*i*+1 ...p*<sup>N</sup>* cannot be selected before the message from p*i*, and therefore determine that the N sends are independent. A *single* scheduling is enough to explore all behaviours of the program!

Having explained the concept of *observability* of operations by examples, let us see how it can be combined with the Optimal DPOR algorithm and achieve such reductions.

#### **3 Using Observers in a DPOR Algorithm**

Our objective is to construct a DPOR algorithm that *lazily* considers interferences based on the existence of *later* operations, called *observers*. In the simplest case, operations that would be conservatively considered interfering are treated as independent in the absence of an observer. Examples in Sect. 2 included write operations whose values were never read, or cases where the order of message deliveries does not affect the order in which the messages are received.

The intuition behind such an SMC approach comes from the fact that it is only operations that *observe* a value (e.g., assertions, receive statements, etc.) that can influence the control flow and lead to erroneous or generally unexpected behaviour. Other operations (e.g., writes, sends, etc.) cannot affect program behaviour if no future operation observes their effects. In such cases, interference between those other operations can be ignored.

#### **3.1 POR Concepts and Optimal DPOR**

The goal of POR techniques is the exploration of only a (small) subset of the possible schedulings of a concurrent program which is *sound*; that is, a subset that includes at least one scheduling from each Mazurkiewicz trace. DPOR algorithms perform a depth-first exploration of the tree of all possible schedulings. Reduction is achieved by exploring only a sound subset of all scheduling choices that are possible at each point in the tree. Such subsets are formed on the basis of two complementary techniques.

– Each point in the tree is associated with a *sleep set*, which contains a set of processes whose exploration would be redundant. More precisely, a process p is in the sleep set after a sequence of form E.v if p has previously been explored after E, and furthermore p does not interfere with v. Thus, exploring E.v.p is redundant, since it was previously explored after E.p (as E.p.v).

– From each point in the tree, the set of explored processes must form a *source set* [2]. (Some DPOR algorithms employ persistent or stubborn sets, which are subsumed by source sets.) Source sets have the property that for any extension which forms a complete (aka *maximal*) scheduling, there is an equivalent extension in which the next step is taken by a process in the source set. A source set is constructed incrementally during the exploration by inspecting encountered races: whenever a scheduling of form E.p.v is explored, in which the step of p is in a race with some step in v, then the reversal of that race will be explored in some other scheduling, where some process q in v is scheduled immediately after E: this is achieved by adding q to the source set after E.

Most existing DPOR algorithms prescribe that from each point in the tree (i) all processes in a source set should be explored, and (ii) no process in the sleep set should be explored. However, these principles are not sufficient to avoid redundant exploration [2]. The reason is that the reversal of a race in E.p.v may happen only by exploring a particular subsequence of v; since a source set can only contain the first step in such a sequence, it can not prevent continued exploration beyond that first step from being redundant. Optimal DPOR improves on earlier techniques by using *wakeup trees* [2] in addition to sleep sets. Wakeup trees are composed of partial execution sequences (called *wakeup sequences*) that (a) reverse the order of the interfering operations, and (b) are provably non-redundant. Optimal DPOR, currently the state-of-the-art DPOR algorithm, always uses wakeup sequences to explore new schedulings. As a result, Optimal DPOR does not even initiate redundant exploration, and can achieve exponential reduction over e.g., the original [10] or the Source DPOR [2] algorithm.

#### **3.2 Observers and Sleep Sets**

The use of sleep sets is not trivial when using observers, because interference between events can often not be determined when they occur, but only later in the scheduling. Let us illustrate using an example. In the next program, three processes (p, q and s) send tagged messages (with tags A and B) to a receiver process r, which uses selective receive to read matching messages from its mailbox. Each message also contains the process identifier of the sender.

$$\begin{array}{c||c||c|c||c|c} p & q & q & s & r \\ r & \{\mathsf{B}, p\}; & \left\| \begin{array}{c} r \\ r \end{array} \right\| & r & \{\mathsf{A}, q\}; \; \left\| \begin{array}{c} s \\ r \end{array} \right\| & r & \{\mathsf{B}, s\}; \; \left\| \begin{array}{c} \mathsf{reciceive} \ \{\mathsf{A}, \mathsf{x}\}; \\ \mathsf{if} \ \{\mathsf{x} == p\} \\ \mathsf{reciceive} & \{\mathsf{B}, \mathsf{y}\} \end{array} \right\| \end{array}$$

In standard DPOR, the sends are interfering, since the order of delivery can affect the values assigned to the x and y variables in r. Using observers, sends are interfering only if justified by an observing receive operation. Assume that the first explored scheduling is p.p.q.s.r.r. Here, the second send by p (sending the message tagged with A) interferes with the send by q, since their order is observed by the first receive of r (if the message from q had been delivered first, it would have been the one picked instead). Furthermore, the first send by p (sending the message tagged with B) interferes with the message send by s, since they have the second receive of r as observer. In order to explore the reversal of the race between the first send of p and that of s, the algorithm needs to explore a scheduling in which p's first send is executed after s. Such a scheduling must clearly start with s. The rules for sleep sets prescribe that p should be in the sleep set at the start of this exploration, and that p should be removed from the sleep set after executing s if p and s interfere. However, this interference is visible only later, making it unclear what to do. On the one hand, removing p from the sleep set on the grounds that it "might" interfere with s risks to explore redundant schedulings and defeats the purpose of observers. On the other hand, keeping p in the sleep set and "see what happens" prevents exploring the effects of the race reversal, since that requires the second send of p to be explored before q, which is forbidden if p remains in the sleep set. Thus, sleep sets are not a sufficiently precise mechanism for avoiding redundant exploration without missing non-redundant schedulings.

#### **3.3 Introducing Observers to Optimal DPOR**

We will now explain how Optimal DPOR can be adapted to work with observers. There are two main challenges: (1) we need to address the fact that, in the presence of observers, interference is conditional, and (2) we also need a suitable replacement for sleep sets, since we can no longer use them to guarantee that there is no redundant exploration.

In Optimal DPOR, it is assumed that operations that are interfering in some execution sequence remain interfering in any prefix of that sequence. This is no longer true when we determine interference by the existence of observing operations. If an observer is not included in a prefix of an execution sequence in which two operations were observably interfering, the same two operations will be independent. To address challenge 1 in Optimal DPOR with observers, we need to extend the wakeup sequences constructed for reversing the order of interfering operations that require an observer, with a suffix that includes the observer. It is allowed for this suffix to include operations happening after the interfering operations (even in program order); any such operations will behave identically in the reversal because in the original scheduling the observer was the first event that could be affected by the ordering of the interfering operations. To address challenge 2, we can build on the intuition behind sleep sets and assert that when our algorithm is done with a particular state, it has explored all schedulings that can start with the step that led to that state. When the algorithm considers a new scheduling (based on a wakeup sequence), information about observers in that scheduling needs to be recalculated from the operations in the sequence. The algorithm can then perform an exhaustive test, that ensures that each step previously explored from any point in the execution is overtaken by some other step in the wakeup sequence under consideration.

### **4 Framework**

We consider a concurrent system composed of a finite set of *processes* (or threads). Each process executes a deterministic program, in which statements act on the global *state* of the system. Processes can interact via shared variables, messages, etc. We assume that the state space does not contain cycles, and that executions have bounded length. A step of a process may not disable another process.

Formally, let <sup>Σ</sup> be the set of states of a concurrent system and <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>Σ</sup> be the unique *initial state*. The partial function *execute<sup>p</sup>* : <sup>Σ</sup> → <sup>Σ</sup> describes execution, representing an atomic *execution step* of process p, which may depend on and affect the state. An *execution sequence* E of the system is a finite sequence of execution steps of its processes that is performed from the initial state. We use to denote the empty sequence and . to denote concatenation of sequences of process steps (e.g., p.p.q denotes the execution sequence where first p performs two steps, followed by a step of q). The sequence of process steps in E also uniquely determine the state of the system after E, which is denoted s[*E*]. For a state s, let *enabled*(s) denote the set of processes p that are enabled in s (i.e., for which *executep*(s) is defined). If <sup>p</sup> <sup>∈</sup> *enabled*(s[*E*]), then E.p is an execution sequence. A sequence <sup>E</sup> is *maximal* if *enabled*(s[*E*]) = <sup>∅</sup>, i.e., no process is enabled after <sup>E</sup>. An *event* p, i of <sup>E</sup> is a particular occurrence of a process in <sup>E</sup>, representing the i-th occurrence of process p in the execution sequence. We use w, w ,... to range over sequences, e, e ,... to range over events, as well as:


We assume a function which assigns a *happens-before relation* [15] to any execution sequence <sup>E</sup>, denoted as <sup>→</sup>*E*.

We will keep the general approach of Optimal DPOR and require the happens-before relation to satisfy a set of properties, collected in Definition 1. These properties are the first point where we diverge from the underlying model for Optimal DPOR [2, Definition 3.2]. In that definition, Properties (3) and (5) need to be weakened, Property (6) needs to be replaced, whereas Property (7) was only required for Source DPOR and is thus dropped.

**Definition 1 (Properties of valid happens-before relations).** *A happensbefore assignment, which assigns a unique happens-before relation* →*<sup>E</sup> to any execution sequence* E*, is* valid *if it satisfies the following properties for all execution sequences* E*:*

	- *–* <sup>E</sup> E *to denote that dom*(E) = *dom*(E ) *and that* E *and* E *are linearizations of the same "happens-before" relation, and*
	- *–* [E] *to denote the equivalence class of* <sup>E</sup>*.*

For the last property, we need to introduce a few definitions. Given →*E*, if e, e <sup>∈</sup> *dom*(E) and e <*<sup>E</sup>* <sup>e</sup> , define ) to denote that <sup>e</sup>→*E*e and <sup>e</sup>--


Now we continue listing properties of valid happens-before relations.

	- *(a) For all* <sup>o</sup> <sup>∈</sup> <sup>O</sup>*, it holds that* <sup>e</sup>→*E*o*,* <sup>o</sup> <sup>=</sup> <sup>e</sup> *, and* <sup>o</sup><sup>→</sup> *<sup>E</sup>*e *.*
	- *(b) For all* o, o <sup>∈</sup> <sup>O</sup> *it holds that* <sup>o</sup><sup>→</sup> *<sup>E</sup>*o *.*
	- *(c) If* <sup>E</sup> E *then* O = *observers*(e, e , E ) = O*.*
	- *(d) For every prefix* <sup>E</sup> < E *of* <sup>E</sup> *such that* e, e <sup>∈</sup> *dom*(E )*:*
		- *If* <sup>O</sup> *is empty, then* <sup>e</sup>→*<sup>E</sup>*e *.*
		- *If* <sup>O</sup> *is nonempty, then* <sup>e</sup>→*<sup>E</sup>*e *iff dom*(E ) <sup>∩</sup> <sup>O</sup> <sup>=</sup> <sup>∅</sup>*.*
	- *(e) If* e -*<sup>E</sup>* e *, then for all sequences* <sup>w</sup> *such that* <sup>E</sup> <sup>w</sup> *and all events* <sup>e</sup> <sup>∈</sup> *dom*(E)*:*
		- *If* <sup>e</sup><sup>→</sup> *<sup>E</sup>*e*, then* <sup>e</sup>-*E.w*e*. – If* <sup>e</sup><sup>→</sup> *<sup>E</sup>*e *, then* <sup>e</sup>-*E.w*e *.*
	- *(f ) For all* <sup>e</sup> <sup>∈</sup> *dom*(E) *such that* <sup>e</sup> <sup>→</sup>*E*e *it holds that* <sup>O</sup> <sup>∩</sup> *observers*(e , e, E) = <sup>∅</sup>*.* .o-
	- *(g) If* <sup>O</sup> <sup>=</sup> {o} *and* <sup>E</sup> <sup>=</sup> <sup>E</sup> *for some* o *and* E *, then for any* <sup>E</sup> E *, either* <sup>e</sup>→*<sup>E</sup>*--*.o*e *or* e →*<sup>E</sup>*--*.o*e*.*

We give some intuition for the changed properties. Property 3 requires the happens-before assignment to maintain edges in extensions, but allows having fewer edges in prefixes. Property 5 allows execution sequences that reach different states (due to unobserved races) to be considered equivalent. Property 6 summarizes properties for races that require observers. Most requirements are intuitive. Property 6.(d) clarifies Property 3: an "observed" race is included in a sequence only if some observers of the race are also included. Property 6.(e) prevents extensions to an execution sequence from adding edges to the events of a reversible race in such a way that the race can not be reversed. Property 6.(f) prohibits an observer from creating "dependency chains". Finally, Property 6.(g) requires that an observer observes a fixed set of pairs of events in each execution sequence; a consequence of this is that whether or not some particular race is observed never depends on the ordering of some other pair of events observed by the same observer. All these properties are satisfied by "natural" happens-before assignments for events in message passing programs and most shared memory programs. Limitations include e.g., models in which the written memory regions of two write operations may overlap without being equal; such pairs of operations need to be treated as unconditionally racing.

### **5 Optimal DPOR with Observers**

We now present a DPOR algorithm with observers that achieves optimal reduction.

In Sect. 3.2 we explained why sleep sets are not suitable when observers are used. We instead introduce a notion of redundancy based solely on the set of explored steps from each state. We will base this notion on a concept defined in Optimal DPOR.

**Definition 2 (Initials and Weak Initials** [2]**).** *For an execution sequence* E.w*, the set I*[*E*](w) *of processes that are* initials *and the set WI*[*E*](w) *of processes that are* weak initials *are defined as follows:*

*1.* <sup>p</sup> <sup>∈</sup> *<sup>I</sup>*[*E*](w) *iff there is a sequence* <sup>w</sup> *such that* E.w E.p.w *2.* <sup>p</sup> <sup>∈</sup> *WI*[*E*](w) *iff there are sequences* <sup>w</sup> *and* <sup>v</sup> *such that* E.w.v E.p.w

**Definition 3 (Redundant Sequences).** *For an execution sequence* E *and a function done from prefixes of* E *to sets of processes, the set of sequences redundant*(E, *done*) *is defined such that* <sup>v</sup> <sup>∈</sup> *redundant*(E, *done*) *iff* E.v *is an execution sequence and there is a partitioning* E = w.w *of* E *such that some process* <sup>p</sup> <sup>∈</sup> *done*(w) *is also in* <sup>p</sup> <sup>∈</sup> *WI*[*w*](w .v)*.*

The intuition is that if <sup>v</sup> <sup>∈</sup> *redundant*(E, *done*), then the execution sequence E.v is equivalent to a previously explored execution sequence. In the special case where races do not need observers (i.e., the set of observers for each race is empty), we can define sleep sets in the classical sense by letting <sup>p</sup> <sup>∈</sup> *sleep*(E) denote that E is of form E .v for some <sup>v</sup> such that <sup>p</sup> <sup>∈</sup> *done*(E ) and p and v

are independent. Then *sleep*(E) will consists of all single-process sequences in *redundant*(E, *done*), and <sup>v</sup> <sup>∈</sup> *redundant*(E, *done*) is equivalent to *sleep*(E) <sup>∩</sup> *WI*[*E*](v) <sup>=</sup> <sup>∅</sup>.

If E is an execution sequence, and v and w are sequences of processes, let:


Let us define an *ordered tree* as a pair B, ≺, where <sup>B</sup> (the set of *nodes*) is a finite prefix-closed set of sequences of processes, with the empty sequence being the root. The children of a node w, of form w.p for some set of processes <sup>p</sup>, are ordered by <sup>≺</sup>. In B, ≺, such an ordering between children has been extended to the total order <sup>≺</sup> on <sup>B</sup> by letting <sup>≺</sup> be the induced post-order relation between the nodes in B. This means that if the children w.p<sup>1</sup> and w.p<sup>2</sup> are ordered as w.p<sup>1</sup> <sup>≺</sup> w.p2, then w.p<sup>1</sup> <sup>≺</sup> w.p<sup>2</sup> <sup>≺</sup> <sup>w</sup> in the induced post-order.

**Definition 4 (Wakeup Tree).** *Let* E *be an execution sequence, and done be a function from prefixes of* <sup>E</sup> *to sets of processes. A* wakeup tree after E, *done is an ordered tree* B, ≺*, such that the following properties hold*


Property (2) is the same as Optimal DPOR; Property (1) has been modified.

Regarding inserting sequences in a wakeup tree, let B, ≺ be a wakeup tree after E, *done*. For any sequence <sup>w</sup> such that <sup>w</sup> <sup>∈</sup> *redundant*(E, *done*) we need an operation *insert*[*E*](w,B, ≺) that satisfies the following properties:


The *insert*[*E*](w,B, ≺) operation can be implemented as follows. Let <sup>v</sup> be the smallest (w.r.t. to <sup>≺</sup>) sequence in <sup>B</sup> such that <sup>v</sup>∼[*E*]w. If <sup>v</sup> is a leaf, *insert*[*E*](w,B, ≺) can leave the tree unmodified. Otherwise, let <sup>w</sup> be a shortest sequence such that <sup>w</sup> [*E*] v.w , and add v.w as a new leaf, ordered after all already existing nodes in B of form v.w.

#### **5.1 Algorithm**

Algorithm 1 is a modified and extended version of the plain Optimal DPOR algorithm [2], so that it supports observers. Since sleep sets is no longer an applicable mechanism for avoiding redundant exploration, the algorithm accepts only two arguments, E, the prefix to explore, and *WuT*, the initial wakeup tree after E. It keeps two global variables, *wut*, a mapping from execution sequences to wakeup trees, and *done*, a mapping from execution sequences to sets of processes. For a pair of events e, e <sup>∈</sup> *dom*(E) that are in a reversible race (<sup>e</sup> -*<sup>E</sup>* e ) in E, the algorithm employs the following notation:



The first change compared to Optimal DPOR is in lines 6 to 8 which describe how to construct a wakeup sequence for an observed race, including an observer operation. Second, the test <sup>v</sup> <sup>∈</sup> *redundant*(E, *done*) on lines 11 replaces the test *sleep*(E ) ∩ *WI*[*E*-](v) <sup>=</sup> <sup>∅</sup> at the corresponding place in Optimal DPOR. The rest of the algorithm is essentially the same, with initialization, update and propagation of sleep sets removed.

#### **5.2 Correctness and Optimality**

The correctness and optimality of Algorithm 1 are stated in the following theorems.

**Theorem 1 (Correctness of Optimal DPOR with Observers).** *Whenever a call to Explore*(E,*WuT*) *returns during Algorithm 1, then for all maximal execution sequences* E.w*, the algorithm has explored some execution sequence* E *which is in* [E.w]*.*

Since the initial call to the algorithm uses the arguments *Explore*(, {}, ∅), Theorem <sup>1</sup> implies that for all maximal execution sequences <sup>E</sup> the algorithm explores some execution sequence <sup>E</sup> which is in [E].

**Theorem 2 (Optimality of Optimal DPOR with Observers).** *Algorithm 1 never explores two maximal execution sequences which are equivalent.*

If Algorithm 1 is not at the end of a maximal sequence, it will continue exploring the scheduling either by using information from a wakeup tree (line 15) or by choosing an arbitrary enabled process (line 18). Theorem 2 ensures that all maximal execution sequences reached are non redundant.

#### **6 Implementations**

We have implemented Algorithm 1 in two SMC tools: Nidhugg and Concuerror.

**Observers in Nidhugg.** Nidhugg [1] is a stateless model checking tool for shared-memory pthreads programs written in C or C++ that operates by interpreting LLVM IR. Nidhugg can test programs also under relaxed memory models (TSO, PSO, and Power), but in this paper we will limit ourselves to testing programs under Sequential Consistency.

In the context of shared memory, the observers extension was used to make races between writes to the same memory location conditional on the existence of a read of that memory location that "observes" those writes. In order to add the observers extension to Nidhugg, the tool was first extended to support Optimal DPOR, as it previously only implemented Source DPOR, which is not easily extended with observers, as discussed in Sect. 3.2. The tool now records symbolic representations of program events that contain enough information to reconstruct the happens-before relation induced by a particular execution. For Source DPOR, these symbolic events are unnecessary if the happens-before relation is stored in vector clocks [18], as it is in Nidhugg. For Optimal DPOR, symbolic events are the most reasonable way to implement tests that check whether a given process is a weak initial of some sequence, which is needed for both the redundancy check and wakeup tree insertion.

To extend this implementation with observers, symbolic events for writes were extended with an "observed"-flag, which is unset until a read that reads the value written by that write is executed. At the end of the execution, we compute the vector clocks of the happens-before relation, only considering two write events to the same memory location as interfering if at least one of them has the "observed"-flag set. Then, Optimal DPOR was modified as described in Sect. 5.1. The check whether a wakeup sequence is redundant on line 11 is implemented using sleep sets extended with processes conditionally sleeping unless an address is read, and a set of addresses that must be read, without intervening writes, before the end of the program.

**Observers in Concuerror.** Concuerror [8] is a stateless model checking tool for Erlang, a functional programming language based on the actor model of concurrency [4]. In Erlang, actors are realized by language-level processes implemented by the runtime system instead of being directly mapped to OS threads. Each Erlang process communicates with other processes via asynchronous message passing. Messages are placed in the mailbox of the receiving process in the order they are delivered. A process can consume messages using *selective* receive, which is a *blocking* operation when the mailbox does not contain any matching message, unless a timeout clause is specified. If multiple messages can match, the oldest message is picked from the mailbox.

Concuerror already implemented Optimal DPOR, but treated any two message deliveries to the same mailbox as interfering. With the extension, Concuerror uses receives as observers of sends. When examining a complete scheduling, an extra pass is performed, annotating each message delivery event with the patterns that were used in the receive that picked the message (if present) and the receive order. If the message of a later delivery matches any of the pattern annotations of an earlier delivery, the deliveries interfere. The *notobs* sequence is constructed from all the events that lead up to the corresponding receive (which is the observer), excluding events in the *notdep* sequence. Because the resulting wakeup sequence contains fewer events, observer information is recomputed, and then all the earlier *done* sets are checked for weak initials of the wakeup sequence, exactly as described in Algorithm 1.

### **7 Experimental Results**

We report experimental results that compare the performance of two algorithms: Optimal DPOR (denoted in the tables as "optimal") and Optimal DPOR with Observers (denoted as "observers"). We ran all benchmarks on a desktop with an i7-3770 CPU (3.40 GHz) with 16 GB of RAM running Debian 4.12.0-2-amd64 and LLVM 3.8.1. The machine has four physical cores, but presently both tools use only one of them.

**Observers in Nidhugg.** Table 1 shows the effect of observers on shared memory C/pthread programs. We used two kinds of programs: (1) synthetic benchmarks similar to those of Sect. 2, and (2) programs from SV-COMP and/or from "similar" papers. We report the number of traces that the two algorithms explore, the time it takes to explore them, and the memory used (although this number is not interesting for an SMC tool).


**Table 1.** Performance of Optimal DPOR vs. Optimal DPOR with Observers in Nidhugg.


For lastwrite(n), we see a reduction in the number of interleavings explored from n! to n, as explained in Sect. 2. For floating read(n), optimal shows the predicted (n + 1)! interleavings, and for n = 2, observers reduce the interleaving count from 6 to 5 as expected. In general, the benchmark has <sup>n</sup> <sup>×</sup> <sup>2</sup>*<sup>n</sup>*−<sup>1</sup> + 1 interleavings with observers. Notice that any technique that differentiates equivalence classes by the partial order of program steps must explore at least as many interleavings or violate Property 4. The next two programs (apr 1 and fib) are examples of programs for which observers have no effect. We see that the extra overhead is very moderate for both programs.

In the last benchmark (lamport), we see that observers improve performance. As Nidhugg does not implement await statements (which are used by lamport), it emulates these with assumes. In such cases, Nidhugg might explore some traces in which these assumptions are violated. We list those traces separately, so for this benchmark the "Traces Explored" columns show a+b entries, which means that Nidhugg explored a + b traces but b of those times an assume statement was violated.

**Observers in Concuerror.** Table 2 shows the effect of observers in message passing programs; we omit memory used, as both algorithms have similar requirements.

**Table 2.** Comparison of Optimal DPOR vs. Optimal DPOR with Observers in Concuerror.


not selective(*n*). *n* processes send messages to a process, that can receive any message sent to it.

selective(*n*). This is a generalized version of the last example of Sect. 2. A process uses pattern matching to choose between messages from *n* different senders.

lock(*n*). This is a program in which *n* workers acquire and release a lock simulated by an Erlang process. When using observers, it has *n*! schedulings. Without observers the number of schedulings is higher.

poolboy. A benchmark created from a unit test of a worker pool library [2].


The two benchmarks on the left sub-table confirm the behaviour we expect. When receives are not selective, the number of traces explored by both algorithms is n!. With selective receive (selective benchmark) observers explore only one trace.

The first program on the right sub-table (lock) is originally a shared-memory program that when translated to Erlang simulate locks using message passing. To acquire the lock, a process sends a message with its identifier to the "lock process" and then waits for a reply. Upon receiving the acquire message, the lock process uses the identifier to reply and then waits for a release message. Other acquire messages become queued in the mailbox of the lock process. Upon receiving the release message, the lock process loops back to the start, retrieving the next acquire message and notifying the next process. Notice that, without observers, the delivery of the release message of a process interferes (redundantly) with the delivery of acquire messages of other processes, unlike acquire operations on true locks which cannot be executed before a release operation (such messages were treated exceptionally in the evaluation of Optimal DPOR). Observers remove the need for special handling: the receive statements are enough to precisely determine which pairs of send operations are interfering.

The next two table rows (poolboy and gproc) show results from "real" Erlang programs. We see that observers provide a moderate reduction in both the number of traces that need to be explored as well as in time.

Finally, the last program (corfu-repair) is the one that triggered this work. As can be seen in the table, observers allow Concuerror to complete SMC in a bit more than two days, while without observers the tool needs to explore exactly 24 times as many traces, taking more than 42 days to finish.

### **8 Related Work**

POR techniques have been continuously evolving w.r.t. how they determine interference. Refining the conditions under which higher-level operations interfere has been shown to have significant impact, regardless of whether the states in which such operations are executed is also a parameter or not [13]. In this work, we have extended this idea, parameterizing the interference between operations using distinct observer operations.

DPOR techniques have also been extended to take advantage of special properties of the underlying concurrency model. For the actor model, the transitivity of the dependency relation for send operations has been exploited to defer early planning of interleavings [21]. This improvement is orthogonal with Optimal DPOR (and with our extension), as it reduces the number of wakeup sequences that are added "early" in an exploration. For event-driven systems, it has been shown [17] that two post operations to an event dispatch queue need not be considered dependent: reordering of such operations can be decided later, upon detection of interference between other operations within the respective event handlers. However, this treatment applies only under a specific interpretation of 'message passing' that exploits additional semantic structure of an actor's mailbox. Our technique is applicable to a wider spectrum of programs.

Context-Sensitive DPOR [3] uses an external procedure to decide whether alternative schedulings would lead to identical states and, like optimal DPOR with observer, is also able to achieve exponential reduction in certain cases. However, since it needs to compare states, it is an inherently stateful technique, in contrast to our technique that inspects only one trace at a time to lazily construct reversible races.

Data-Centric DPOR (DC-DPOR) [7] is an SMC technique that explores a related but different notion of observability. It defines two executions to be equivalent if each read reads from ("observes") the same write in both executions. In contrast, our notion of observability is based on observing *interference of* *operations*, not just individual writes. DC-DPOR's resulting equivalence relation is coarser than ours, which is based on Mazurkiewicz traces. However, DC-DPOR is optimal only for programs with acyclic communication graphs, while being non-optimal otherwise. Also, DC-DPOR models message passing using locks and shared memory, which at best gives as few traces as Optimal DPOR gives without the improvements presented in this paper.

### **9 Concluding Remarks**

In this paper we presented an extension to the Optimal DPOR algorithm for SMC that uses observability to refine which operations are considered as interfering. We described the challenges and motivated the necessary modifications, gave a formal description of the algorithm and the theory behind it and reported on two implementations in SMC tools, demonstrating that Optimal DPOR with Observers can achieve significantly better reduction in both shared memory and message passing programs.

**Acknowledgments.** This work was carried out within the Linnaeus centre of excellence UPMARC (Uppsala Programming for Multicore Architectures Research Center), and was partly supported by grants from the Swedish Research Council.

**Data Availability Statement.** The versions of Nidhugg and Concuerror, as well as all the programs we used to obtain the experimental results of Tables 1 and 2 are available in the Figshare repository [6]. Also included in the artifact are instructions on how to use it to reproduce the results reported in this paper. As per the TACAS 2018 submission rules, the artifact is designed for use with the TACAS 2018 Artifact Evaluation Virtual Machine [14], although, as source code is included, it can probably be used on any Linux platform. We refer to the documentation of the respective tool on how to compile them from source code; the tools may of course evolve over time, but the way to build them will not change significantly.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Structurally Defined Conditional Data-Flow Static Analysis**

Elena Sherman1(B) and Matthew B. Dwyer<sup>2</sup>

<sup>1</sup> Boise State University, Boise, ID 83706, USA elenasherman@boisestate.edu <sup>2</sup> University of Nebraska - Lincoln, Lincoln, NE 68588, USA

matthewbdwyer@unl.edu

**Abstract.** Data flow analysis (DFA) is an important verification technique that computes the effect of data values propagating over program paths. While more precise than flow-insensitive analyses, such an analysis is time-consuming.

This paper investigates the acceleration of DFA by structural decomposition of the underlying control flow graph. Specifically, we explore the cost and effectiveness of dividing program paths into subsets by partitioning path suffixes at conditional statements, applying a DFA on each subset, and then combining the resulting invariants. This yields a family of independent DFA problems that are solved in parallel and where the partial results of each problem represent safe program invariants.

Empirical evaluations reveal that depending on the DFA type and its conditional implementation the invariants for a large fraction of program points can be computed in less time than traditional DFA. This work suggests a strategy for an "anytime DFA" algorithm: computing safe program invariants as the analysis proceeds.

### **1 Introduction**

Software developers use static analyses as a supplement to traditional dynamic testing approaches. Tools such as AbsInt Astr´ee [1], Facebook Infer [2], and MathWorks Polyspace<sup>1</sup> are becoming standard parts of development workflows. Advances in program analysis and theorem proving have helped static program analysis become more feasible for verification of general-purpose software.

The power of static analysis to consider all program behaviors follows from its ability to safely over-approximate program behaviors by abstracting the concrete domain of program variables and the programming language semantics. But at the same time its over-approximating nature causes static analysis to identify some property violations as uncertain. The reason for this uncertainty is that a static analysis cannot tell if a violation happens on a feasible or an infeasible, i.e., strictly over-approximating, program behavior. This inconclusiveness is

<sup>1</sup> http://www.mathworks.com/products/polyspace.html.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 249–265, 2018. https://doi.org/10.1007/978-3-319-89963-3\_15

unacceptable since each potential violation must be examined further. An automatic solution to the elimination of false positive violations is to increase the precision of a static analysis, i.e., improve the analysis so it considers fewer infeasible behaviors.

However, improving analysis precision generally increases analysis cost in terms of running time and memory consumption. A common approach to address this problem is to decompose the program's state space into several subspaces and perform analysis on each separately. What distinguishes those techniques are the underlying decomposition methods.

One approach focuses on making a precise static analysis scalable by decomposing a large program into modules like procedures and classes, and allowing the analysis to examine each partition independently. Next, the analyzed information of each module is composed together to obtain the result of the whole program analysis. In the literature [3,4] this method is referred to as *partial static analysis*.

Another approach aims to improve the scalability of precise analysis by permitting the analysis to explore only those program states for which it is adequately precise, i.e., able to provide definitive result. In the literature [5–7] this approach is called *conditional static analysis* (CSA) since the permitted states are described by a condition θ expressed as a logical formula. In such a framework an analysis verifies a program under some assumptions, i.e., there are no null pointer exceptions or a pre-condition on input values is assumed to hold. Next, another analysis attempts to prove these assumptions by showing that the states, which do not satisfy θ are either not reachable or do not lead to property violations. In prior work the condition θ is either determined from the analysis design [5,6], where θ is applicable to all program states, or determined during program analysis execution [7], where θ is composed of the conditions assumed to hold for a certain set of states.

While previous work on CSA focuses on finding values of θ that ensure an increase in analysis precision, in this paper we explore the decomposition of the program's state space in order to improve the efficiency of the analysis. We decompose the program's state space based on the program's control flow graph (CFG), i.e., on the program's structural information. Each partition corresponds to a set of paths expressed as a set of CFG branches π. This permits a path, or π, defined CSA to compute invariants for each π independently and in parallel. While one can use a logical formula θ as a precondition to restrict program input values to those that follow a particular path, we conjecture two primary advantages of structural decomposition. First, π is expressed directly as a subset of CFG branches and computing an equivalent θ, expressing constraints on input values, would require complex value propagating analyses. Second, because π is structural its effect on the analysis is independent of the abstract domain, whereas even an equivalent θ may not be effective in preventing values from flowing along a branch due to over-approximation by the abstract domain.

The contributions of this paper are presentation of:


In the next section, we provide an overview of the structural CSA approach and pose our research questions. After that we formalize CSA in Sect. 3 and demonstrate in Sect. 4 two different ways of implementing CSA in an existing program analysis framework. In Sect. 5 we present our approach to partitioning a CFG. Then we present our experiments and discuss related work.

#### **2 Overview**

We begin with an example of a traditional data-flow analysis. Data-flow analysis calculates some information for each point in a program based on the program structure and the language semantics. The calculated facts, i.e., program invariants, are then later used to reason about program properties, usually safety properties, which must hold on all feasible program executions. Data-flow analyses that compute invariants that are satisfied by all paths are called *must* analyses. In our example we show how a data-flow analysis computes invariants for each program statement.

Consider a program and its corresponding CFG in Fig. 1(a). In this example x is an integer variable. The edges of the CFG are labeled with T for *true* branches and F for *false* branches of the conditional statement.

In order to calculate invariants *static analysis* (SA) works with abstract values of x, which are composed of the elements of an abstract domain. For example, the *signs* abstract domain has three elements {+, 0, −}. 0 denotes the singleton set {0} of concrete values, + denotes positive values, and − denotes negative values. If SA employs the *signs* abstract domain then the values of x are expressed as a set containing any of those three elements, including special cases {} ≡ ⊥ for no values and {+, 0, −} ≡ for all values

SA starts by assigning x to at the CFG's entry point, since x can have any concrete value. Upon encountering the conditional statement SA computes invariants for x along the true branch, then along the false branch, and then merges these values before the return statement. The left CFG in Fig. 1 shows the result of the analysis where the CFG's edges are annotated with computed invariants for x. Clearly, the computations along these two branches are independent of each other and could be done simultaneously, thus reducing the computational time. This observation is the main idea behind our approach.

In other parallel SA approaches that we discuss in Sect. 7, the parallel computation is done inside a full SA. During the computation a parallel SA waits at the merge point, where the analysis combines the results of the two branches, on the completion of each branch before proceeding further thereby reducing parallelism.

**Fig. 1.** Source code and its CFG (a); analysis examples: *signs* analysis result (b), CSA *sign* analysis result for set paths with *1t* prefix (c) and *1f* prefix (d)

Moreover, if we can analyze the true and the false branches independently then the invariants computed along the true branch could be accessed even sooner for a user to process. This observation is another inspiration for designing "anytime DFA", which provides a sound information about some program's invariants.

As mentioned, in general, it would be difficult to compute a precondition θ that restricts the input values of x to only those that would take the CSA computation to a particular set of branches. However, in path-defined CSA those branches can be stated explicitly. In our example we can have two set of paths: one defined by π<sup>1</sup> = {1t}, i.e., take the true branch of the first conditional statement only, and π<sup>2</sup> = {1f}, i.e., take the false branch of the first conditional statement. The results of these two path-defined CSA are in (c) and (d) in Fig. 1, respectively. We can see that the union of the abstract element sets for π<sup>1</sup> CSA and π<sup>2</sup> CSA on the corresponding edges results in the same invariants of the full analysis, that is CSA produces sound results. Section 3 formalizes the conditions under which soundness holds in CSA. Overall CSA can potentially provide two main benefits to a user: (1) the speedup of the analysis using parallelism *and* (2) delivering fast useful feedback to users.

One of the objectives of our work is to investigate the efficiency of two πdefined CSA implementations in an existing data-flow framework and its ability to compute sound invariants at intermediate points in the analysis.

To evaluate efficiency improvements we consider a traditional reaching definitions (RD) analysis and value-based data flow analysis (VB) for disjoint domains [8] similar to one used in the above example. Our approach automatically generates a set of π for each method based on heuristics discussed in Sect. 5. Then based on π it recombines CSA in the order of its completion and then compares the result of each combination step to the results of the full SA. Through our experiments we aim to answer the following research questions:


We answer these research questions through an extensive empirical evaluation on real-world programs.

### **3 Conditional Analysis**

In this section we first present the traditional monotone framework for data flow analysis followed by the discussion of the necessary changes that extend it to a conditional data flow framework. This section also outlines the approach of composing the unconditional result from conditional ones.

We use the data flow analysis framework similar to one presented in [9] for an analysis A, only we extended it to express branch-sensitive analysis, where the outgoing flow of a statement l ∈ CFG<sup>P</sup> is defined for each of its outgoing edges (l,l ) ∈ CFG<sup>P</sup> . Thus, the following parameters define A.


Then the set of equations for forward A is defined as follows on entry and exit of each statement l ∈ CFG<sup>P</sup> :

$$\mathcal{A}\_{in}(l) = \bigsqcup \{ \mathcal{A}\_{out}(l', l) \mid (l', l) \in CFG\_P \} \sqcup \iota\_E^l \tag{1}$$
 
$$\text{where } \iota\_E^l = \begin{cases} \iota & \text{if } l \in E \\ \bot & \text{if } l \notin E \end{cases}$$
 
$$\mathcal{A}\_{out}(l, l') = f\_{ll'}(\mathcal{A}\_{in}(l)), (l, l') \in CFG\_P$$

where is the least upper bound operator, ⊥ is the bottom element of D<sup>A</sup> for which ∀d ∈ D<sup>A</sup> : ⊥ d = d and ∀(l,l ) ∈ CFG<sup>P</sup> : fll- (⊥) = ⊥. For safety, ⊥ corresponds to the empty set of concrete values and to the set containing all concrete values. The value of ι is assigned to , i.e., the analysis considers all possible input values for a program. The solution of the above set of equations provides the result of the analysis for P.

In our work we express a condition for DFA as a condition that identifies the set of paths to be analyzed π, which defines a CFG partition. We describe CSA as a special case of <sup>A</sup>, which we denote as <sup>A</sup><sup>π</sup>. Thus, a traditional data flow analysis <sup>A</sup> <sup>=</sup> <sup>A</sup>(∅) ; unspecified branches in π are explored fully. For our formulation of CSA, the edges in π are not nested inside a loop.

We have chosen π to be represented by the set of branch edges in CFG<sup>P</sup> , at most one for each conditional statement l, which the analysis must include while excluding their counterparts. If l has l and l as its true and false targets, respectively, then π can contain the edge (l,l ), or the edge (l,l), or none of them. To capture the relation between the opposite branches of l we designate (l,l ) = ¬(l,l) and vice versa (l,l) = ¬(l,l ). If (l,l ) ∈ π then the values of all variables x<sup>i</sup> incoming to the target of its opposite edge l , i.e., along edge ¬(l,l ), are set to ⊥. For brevity, we denote such case, i.e., when ∀i : x<sup>i</sup> = ⊥, as ⊥ state. Those ⊥ values of the infeasible edges are propagated further to its children making them excluded from the analysis. The same principle applies when the opposite target (l,l) ∈ π. When none of the edges are present in π then the analysis treats them in its usual manner, i.e., propagates the information through both branches.

With these path-based conditions we can now write the set of equations for conditional data flow framework for an analysis <sup>A</sup><sup>π</sup>:

$$\mathcal{A}\_{in}^{\pi}(l) = \bigsqcup \{ \mathcal{A}\_{out}^{\pi}(l',l) \mid (l',l) \in CFG\_P \} \sqcup \iota\_E^l \tag{2}$$

$$\text{where } \iota\_E^l = \begin{cases} \top & \text{if } l \in E \\ \bot & \text{if } l \notin E \end{cases}$$

$$\mathcal{A}\_{out}^{\pi}(l,l') = \begin{cases} \bigwedge^{\prime}(\mathcal{A}\_{in}^{\pi}(l)) & \text{if } (l,l') \in CFG\_P \text{ and } \neg(l,l') \notin \pi \\ \bot & \text{if } (l,l') \in CFG\_P \text{ and } \neg(l,l') \in \pi \end{cases}$$

Let Π be the set of path-based conditions for an analysis A. Executing A with different conditions <sup>π</sup><sup>j</sup> <sup>∈</sup> <sup>Π</sup> produces a set of conditional analysis <sup>A</sup><sup>π</sup><sup>j</sup> . The solution for an l ∈ CFG<sup>P</sup> over Π can be expressed as the meet over all maximal fixed point computations (MFP) produced by each <sup>A</sup><sup>π</sup><sup>j</sup> , which, when equal to the MFP for A, means that SA and CSA produce the same results.

$$\sum\_{\pi\_j \in \Pi} MFP\_{\mathcal{A}^{\pi\_j}}(l) = MFP\_{\mathcal{A}}(l) \tag{3}$$

Since SA performs the computation over all program execution paths then in order for CSA to be sound it must ensure the same. For example consider two conditions {(l,l )} and {¬(l,l )}. The conditional analysis <sup>A</sup>{(l,l- )} analyzes all possible input values for the set of paths containing the true branch of l while <sup>A</sup>{¬(l,l- )} does it for the set of paths containing the false branch of l. Thus, together <sup>A</sup>{(l,l- )} and <sup>A</sup>{¬(l,l- )} analyze all program paths. To formalize the soundness of CSA, we express π as a boolean function g<sup>π</sup> as follows.

Each true edge in CFG<sup>P</sup> is mapped to a boolean variable x<sup>i</sup> and each false edge is mapped to ¬xi. Then edges in π are mapped to a set of literals and g<sup>π</sup> is expressed as a conjunction of those literals. In our example if (l,l ) is mapped to x<sup>1</sup> then g{(l,l-)} := x<sup>1</sup> and g{¬(l,l-)} := ¬x1. The union of these two sets of paths is equivalent to the disjunction of g(l,l-) and g¬(l,l-). Thus, the combination of arbitrary π<sup>1</sup> and π<sup>2</sup> is given as g<sup>π</sup><sup>1</sup> ∨ g<sup>π</sup><sup>2</sup> ≡ π<sup>1</sup> ∪ π2.

Π yields a sound CSA if <sup>π</sup>j∈<sup>Π</sup> <sup>g</sup><sup>π</sup><sup>j</sup> is a tautology. To maximize efficiency of CSA π should be pairwise disjoint – thereby eliminating duplicate computation.

$$\forall \pi\_i, \pi\_j \in \Pi \text{ and } \pi\_i \neq \pi\_j: g\_{\pi\_i} \land g\_{\pi\_j} = false$$

Therefore in order for the analysis to be sound and efficient the partition algorithm should generate partitions of Π that satisfy these two constraints. We discuss our partitioning algorithm in Sect. 5.


### **4 Implementations of Conditional Analysis**

Static analysis developers commonly solve Eq. 1 using an iterative work-list algorithm that propagates the abstract values from the entry nodes l ∈ E, usually the single entry node of a program, to the rest of the nodes while computing Ain and Aout flow values. The algorithm terminates when for each node in the CFG its Ain and Aout are unchanged.

9: **end if**

Algorithm 1 sketches a basic work-list algorithm for a branch-sensitive dataflow analysis where for brevity Ain and Aout are denoted as in and out, respectively. A work-list data structure w keeps track of CFG nodes for which in values are changed in the previous iteration and, thus, require recalculation. The computation reaches a fixed-point when no changes in in are detected which corresponds to w becoming empty. At each iteration a new node l is removed from work-list w, its incoming flows are calculated (lines 4 - 7), and its new outgoing flow is recalculated using the transfer function f (line 8) for each of its successors. That is outNew is an array where each element contains an outgoing flow to each of l's successors. For example, a conditional statement would have its first elements associated with the true branch and the second elements associated with the false branch. Lines 9 - 16 determine the changes in the outgoing flows for each of l's successors by comparing the new and old values of out and insert the affected successors back to w.

In order to further improve the efficiency of the work-list algorithm, an analysis framework takes into the consideration the ordering of nodes in the CFG. It ensures that the nodes in w appearing topologically before a given node are processed first. Since, the CFG can be a cyclic graph, the framework populates w


using a quasi-topological ordering algorithm similar to one presented in Algorithm 2. The node removal and insertion operations on w preserve the CFG's quasi-topological ordering.

A program analysis framework provides analysis developers with implementations of these work-list and ordering algorithms. The developers instantiate their analyses by providing implementations for merge and f functions, as well as an abstract domain and initial flow values. We present two approaches for implementing CSA in such analysis framework.

The first approach CSA<sup>1</sup> uses the transfer function f to set the outgoing flows to the infeasible branches and its successors to ⊥. Algorithm 3 details that approach. Here π is a global variable which in line 3 determines whether the outgoing flow for a successor should be set to <sup>⊥</sup>, or computed using <sup>f</sup>(in, l, s) of the full SA. Extending an analysis framework to implement CSA in f is straightforward and does not require analysis developers to further understand the framework's implementation. However, CSA<sup>1</sup> does perform extra computations along infeasible program paths.

The second approach CSA<sup>2</sup> addresses this potential performance drawback by modifying the quasi-topological DFS search as shown in Algorithm 4. The algorithm does not traverse CFG down the paths of the excluded branches

**Fig. 2.** Combining selected conditional statement <sup>c</sup><sup>2</sup> and CFG (left) to produce an abstract graph (right) encoding <sup>Π</sup> <sup>=</sup> {c1f}, {c1t, c2f}, {c1t, c2t}

(line 5), thus assigning w only those nodes that are in π. When a node is inserted back to w (Algorithm 1 line 13) only the nodes in π are inserted in w at their proper positions. CSA<sup>2</sup> implementation requires that analysis developers an advanced understanding of the analysis framework, i.e., the algorithms and data-structures used in the quasi-topological ordering. However, this approach only iterates over the nodes that are defined in π. We have implemented two approaches and in Sect. 6 we empirically compare them. In the next section we present our approach on partitioning a CFG into a set of partitions Π.

### **5 Partitioning CFG**

A program can have many branches and if we decide to use each of them to partition CFG then the size of Π could become prohibitively large, thus we need to determine which branches should be used to generate Π. The goal of our selection heuristic is to chose those branches that might reduce the computational time. We explore three main characteristics of a conditional statement: (a) whether it has non-empty blocks of code b<sup>1</sup> and b<sup>2</sup> on both *true* and *false* branches respectively, (b) the size of b<sup>1</sup> and b<sup>2</sup> in relation to the entire method and (c) the difference between the sizes of b<sup>1</sup> and b2.

The first heuristic ensures that there is an opportunity for a parallel execution of two branches b<sup>1</sup> and b2. The next two heuristics quantify that opportunity. Among b<sup>1</sup> and b2, we select the one with the maximum block size and calculate its ratio to the number of statement in the method. We call this value rt. Then we calculate another ratio r<sup>d</sup> which is the ratio between the difference in block sizes to the number of statements in the method. If we use |b<sup>i</sup>| to denote the size of b<sup>i</sup> block and |m| the number of statements in method m, then

$$r\_t = \frac{\max(|b\_1|, |b\_2|)}{|m|}, \\ r\_d = \frac{abs(|b\_1| - |b\_2|)}{|m|}.$$

The larger the r<sup>t</sup> and the smaller the rd, the higher the chances that CSA has better performance if those branches are used to partition CFG. After selecting a set of branches, we first ensure, for sound CSA analysis, that they do not appear inside loops. Next, we combine the selected conditional statements c<sup>i</sup> with structural information about the CFG to generate an efficient set of Π.

For example, consider the CFG on the left of Fig. 2 where c<sup>i</sup> are conditional statement and b<sup>i</sup> are blocks of code. If the heuristic determines that the branches of c<sup>2</sup> are suitable for the CFG partition then simply expressing the set of partitions Π as {{c2f}, {c2t}} would result in both CSA computing the invariants along c1's false branch, that is performing the computation twice. In order to avoid this redundancy our partition algorithm traverses the CFG and finds all branches of the conditional statements through which the original conditional statements are reachable and store it as an "abstracted" graph similar to one shown on the left of Fig. 2. Next, using the abstracted graph we generate Π for CSA which in this case are {c1f}, {c1t, c2f}, {c1t, c2t}.

Such post-processing also handles cases when both c<sup>2</sup> and c<sup>3</sup> are marked for partition. A simplistic approach is to create all possible combinations of their branches, but that results in identical partition that compute the same invariants, for example, {c3t, c4f} and {c3t,c4f} compute the true branch of c<sup>3</sup> both times. In contrast, our partition generation detects that c<sup>3</sup> and c<sup>4</sup> are independent. In our evaluation section we describe the threshold values we used for r<sup>d</sup> and r<sup>t</sup> parameters.

### **6 Evaluation**

We evaluate our implementations of the path-defined conditional analysis using two distinct analyses: intra-procedural value-based analysis (VB) and an intraprocedural reaching definitions analysis (RD). For VB analysis we used implementation and abstract domains that we developed in our previous work [8]. For RD we used the implementation provided with Soot framework distribution. RD is a relatively fast analysis with an easily computable transfer function, while VA takes longer to complete due to its complex transfer function evaluations. For each of the analysis we performed experiments with their full versions SA, i.e., VB and RD, their CSA<sup>1</sup> versions implemented with Algorithm 3, which we name CVB<sup>1</sup> and CRD1, and their CSA<sup>2</sup> versions implemented with Algorithm 4, which we name CVB<sup>2</sup> and CRD<sup>2</sup> respectively. The source code, program subjects and instructions on replicating the experiment are available on GitHub<sup>2</sup>.

**Program Subjects.** In order to perform our evaluations we first analyzed 105 methods in 19 Java classes across 10 open-source projects that we used in our previous work [8] where we employed Boa [10] to mine methods of open-source programs from GitHub, count the number of operations in each method and then we randomly selected those methods that contain at least 180 of integer operations. Among those 105 methods we selected methods with conditional statements that meet the first requirement of our partitioning algorithm to have a non-trivial conditional statement where both true and false branches have nonempty blocks of code. This step reduced the number of methods to 68. Among them 53 methods have at least one non-trivial condition statement outside of loops, which allows for computing sound CSA. Those methods have on average 177 statements and 19 simple conditional statements.

**Abstract Domain Subjects for VB Analysis.** VB analysis uses atomic elements of its abstract domain to express the computed program invariants. To determine whether the size of the disjoint abstract domain influences the efficiency of VB analysis we used three disjoint abstract domains of small (8 atomic elements), medium (10 atomic elements) and large (12 atomic elements) sizes. We randomly chose those abstract domains among available disjoint domains with the same number of atomic elements. Our preliminary experiments have shown that there is no difference in the evaluation data between the domain sizes, so we present the data only for the medium size domain.

<sup>2</sup> https://github.com/BoiseState/Conditional-DFA.

**Fig. 3.** Histograms of ratios between runtimes of full and conditional analyses.

#### **6.1 Experiment Description**

First we analyze 53 methods using full SA, recording its run time and computed invariants after each statement. The CSA evaluation consists of three main steps: (1) generating a set of partitions Π for each method, (2) running CSA<sup>1</sup> and CSA<sup>2</sup> analyses on the partitions and recoding run time and invariants, and (3) aggregating the computed invariants for partitions of the same method. We run experiments on a 2.9 GHz Intel Core i5 processor with 8 GB of memory running OS X operating system with the analysis running on Java RE 1.8.

**Step 1.** We implemented the partition algorithm from Sect. 5 in the Soot Java Optimization framework to take advantages Soot's CFG and other related data structures. The partition algorithm takes as input a class and its method to be partitioned, and parameters r<sup>t</sup> ¯ that determine the minimum value for rt, and r<sup>d</sup> that determines the maximum value for rd. In our evaluations we set r<sup>t</sup> ¯ = 3% and r<sup>d</sup> = 60% for the majority of the methods and increased r<sup>t</sup> ¯ and decreased r<sup>d</sup> values when the number of partitions became greater than 45. This resulted in the increase of r<sup>t</sup> ¯ to 15% for two methods and the following (r<sup>t</sup> ¯, rd) values for three methods: (15%, 30%), (20%, 15%) and (20%, 30%).

This step produced the total of 472 partitions for 53 methods, with the minimum of two partitions and maximum of 32 partitions per method. A partition π is encoded as a set of branches that CSA should take defined by the conditional statement id and the branch's outcomes: either true of false. As defined in our CSA framework, if a conditional statement is not present in π then CSA explores both of its branches.

**Step 2.** We implemented VB, CVB<sup>1</sup> and CVB<sup>2</sup> in the Soot Java Optimization framework and used Z3 version 4.3.2 as the constraint solver. CVB takes the following input parameters: a class name and its method to be analyzed, an abstract domain and a partition π. We executed VB<sup>1</sup> and VB2, for each partition π and the full VB analysis. We implement RD, CRD<sup>1</sup> and CRD<sup>2</sup> also in the Soot framework. CRD takes three input parameters: a class name and its method to be analyzed and a partition π.

We recorded two sets of data that CSA produces: the running time of the analysis and the computed invariants for the corresponding analysis: set of reaching definition elements for CRD and abstract values for variables expressed as SMT constraints for CVB. We execute each experiment three times and use their




**Table 2.** CVB Cost vs. Precision

average to assess CSAs performances. We do not report the time for partitioning since the partitioning is performed once and its running time is negligible compared to the analysis time. For the same reason we do not report the time for combining the analysis described in the next step.

**Step 3.** In the last step we combine invariants of CSA in a way that allows us to answer our research questions. First we order the method partitions based on their average execution time. Then in order to determine all invariants computed at the point when a CSA completes, we combine all invariants from previously completed CSA with the current one. The result is aggregated invariants ordered based on the execution time of the partitions - from fastest to slowest. To compare SA and CSA invariants we use the logical equivalence relation for two invariants. To compare RD and CRD we compared their sets of reaching definition at each program location. To compare VB and CVB we evaluate implication relations between their SMT formulas, i.e, (CVB =⇒ VB) ∧ (VB =⇒ CVB) at each program point. If the formula evaluates to true then we count it as a sound invariant for CVB. If the formula evaluates to false and the first implication evaluates to true, then CSA under-approximates the invariant of SA. All other evaluation of the formula to false indicate either a conceptual mistake in our CSA approach or a bug in our implementations. In all our experiments, we have not observed such cases.

#### **6.2 Results**

**Performance.** We used the ratio between runtimes of the slowest CSA partition and the full SA for each method to compare CSA and SA performances. Fig. 3 shows the histograms the ratios for each analysis implementation. The x-axes show the ratio values and the labels on top of the bars are the counts for that bar interval.

The histograms show that CRD<sup>1</sup> performed the worst since it has many executions with higher runtimes than RD. However, their average runtimes across 53 methods are comparable: CRD<sup>1</sup> is 148 ms and RD is 143 ms. This is because CRD<sup>1</sup> performed much better on larger methods than on smaller ones. Even though CRD<sup>2</sup> has 16 method with ratios greater than 1, its average runtime is 108 ms, which makes this implementation 24% faster than RD.

Both CVB<sup>1</sup> and CVB<sup>2</sup> have few methods with ratios greater than 1.0, however those value are very close to 1.0. Among the 11 CVB<sup>1</sup> methods that underperformed, 6 have ratios of 1.01 and the rest have rations no greater than 1.05. For CVB2's 8 underperforming methods, 5 of them have the ratios of 1.01, 2 have the ratios no greater than 1.05 and one has 1.28 ratio. The average runtimes across 53 methods are 6989 ms for CVB<sup>1</sup> and 7035 ms for CVB2, which is 20% faster than VB's 8689 ms. Even though CVB<sup>1</sup> and CVB<sup>2</sup> have comparable performances, CVB<sup>2</sup> was able to compute more programs faster.

**Invariants.** The results for sound invariants computation are presented in Table 1 for CRD and in Table 2 for CVB. The column headers describe the two points "0", "100" and four ranges "(0,25)", "[25, 50)", "[50, 75)" and "[75, 100)" of the percentage of sound invariants of a full SA that CSA is able to compute. The row header shows the same ratios of running time of CSA to a full SA running time. The cell values represent the count of methods for which CSA is able to compute sound invariants within the given invariant range and within the given time interval. For example in Table 2 the first data row and the second data column contains value 21, which can be interpreted as such: for 21 methods CVB<sup>1</sup> is able to produce up to 25% of the sound invariants computed by a full VB in 20% of time of the full VB. The data in the second data row and in the last column tells us that within 40% of the full VB computational time CVB<sup>1</sup> is able to compute all invariants for 4 methods.

The data show that CSA can produce sound invariants faster for several methods and compute partial sound invariants for a majority of them. For example CVB computes all invariants for 21 methods within 80% of VB runtime and can produce partial sound invariants within 20% of VB runtime. Note that the histogram counts and the values in the last column might not equal. This is because CVB was able to produce the same invariant values as VB after computing only a few partitions, thus the rest of partitions compute redundant information.

The data shows that the efficiency of the CSA<sup>1</sup> and CSA<sup>2</sup> implementations depend on the analysis type. Thus, for CRD its CRD<sup>2</sup> performs better than CRD1. However, for CVB analysis both implementation produce close results with CVB<sup>2</sup> performing slightly better than CVB1. CRD is more sensitive to the implementation because it is a relatively fast analysis - it runs in a fraction of a second while CVB requires several minutes to complete. Overall, the second implementation of CSA that require modification of the underlying topological order algorithm is a better implementation choice.

#### **6.3 Discussion**

The results indicate that CSA allows for faster analysis, while requiring minimal modification in SA frameworks. However, the main contribution of CSA is its ability to provide partial invariants in a fraction of a time of SA. While a user waits for a completion of all partitions to complete she can use the invariants provided earlier to check the safety properties of the program. If such property does hold, then the user has more confidence about the program correctness. However, if the property does not hold for the computed invariants then she can start investigate the cause of it. Moreover, the partition information could accelerate this task since it narrows down the set of paths that causes property violation.

### **7 Related Work**

Besides related work on conditional analysis described in the introduction our work relates the body of research that improve the performance of SA algorithms and the accuracy of SA using program's structural information. The body of work on designing parallel SA algorithms through partitioning the program's state space started back 1990's with the work of Lee at el., [11] that partitioned program CFG into strongly connected components applying fixed point computation inside those components and then using elimination algorithm [12] to combine the data from the external nodes of those components. Albarghouthi at el., [13] investigated parallel C interprocedural analysis, where based on the reachability in the call-graph multiple method analyzed intraprocedurally in parallel. Dewey at el., [14] explores parallel analysis of JavaScript by partitioning the state space of the program into regions that can be computed in parallel and those that require synchronizations of the parallel computations, i.e., merging points of the analysis.

Another body of work identifies partitions of CFG to improve the precision of the analysis by delaying the merge of abstract values from controls flows or adding new abstract elements that exactly describe the join of two abstract elements, i.e., computing disjunctive completion of the partially ordered set. However, disjunctive completion can lead to excessively large representation of abstract values, and at some point, at least some values should be joined in order for the computation to reach its fixed point. Prior research has explored what abstract values should be joined; computational traces [15] or some other heuristic based on the CFG, such as a trace partitioning domain method [16], can provide a basis for these determinations.

Another approach is to delay the join operation by conducting incremental analysis as guided analysis [17]. In this approach, each iteration of the fixed point computation is applied to an incrementally augmented subgraph of P's CFG. For instance on the first iteration, i.e., propagating abstract values through CFG, the analysis considers one true branch of a conditional statement, and on the second iteration it would add the false branch. This approach limits the loss of precision resulting from widening operators for numerical domains, such as polyhedra that have infinite ascending chains. This incremental approach also includes a disjunctive extension when the analysis first performs fixed point computation before extending the part of the CFG's to be analyzed, i.e., successively computing invariants. An orthogonal approach is the path focusing technique [18], which computes invariants separately for each path between two loop-free points in the CFG. Thus, each part of the CFG between entrance and exit of a loop is expanded into a set of paths. After the computation is done, then results of each path are joined.

The latest development has been in combining guided analysis and path focusing techniques [19]. Using this approach, analysis continues to evaluate paths between loop-free points encoded separately with the SMT formula. This approach allows the analysis to explore only those paths that have the potential to improve the precision of the invariants.

Our approach is complimentary to the above techniques, since a CSA for a single partition could use a parallel algorithm for computing its propagation to further improve CSA efficiency.

### **8 Conclusion and Future Work**

In this work we introduce structurally defined conditional static analysis, formalize it in terms of standard data-flow frameworks, provide algorithms for CSA, and two distinct implementations. We evaluate the efficiency and precision of these techniques through extensive empirical study on real-world programs. The key insight is that CSA partitions a program's CFG into a subset of graphs at the conditional statements. These partitions induce a series of independent CSA executions that can run in parallel. The empirical evaluation suggest that CSA provides improvements over the full SA for a significant fraction of a program. In particular depending on the analysis around 24% of methods completed their analysis within 60% of run time required by the full SA. Moreover, CSA is able to produce partial safe invariant computations for a majority of the programs.

In the future we plan to further improve the efficiency of CSA and the confidence of the partial information that it produces. Currently CSA that follow the same path prefix compute identical information for the prefix, we plan to investigate an approach where only one analysis computes the prefix information and communicates to the rest of CSA with the common prefixes. In addition, we would like to qualify CSA's partially computed invariants into safe or underapproximating based on the partition that CSA analyzes. Thus, when a CSA computes an invariant that is marked as safe, the user should use it with the same amount of confidence as she would for the full SA.

**Acknowledgment.** The authors would like to thank Eric Keefe for working on CSA<sup>2</sup> implementation during his REU experience at Boise State University supported by the National Science Foundation under award CNS 1461133.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Geometric Nontermination Arguments**

Jan Leike<sup>1</sup> and Matthias Heizmann2(B)

<sup>1</sup> Australian National University, Canberra, Australia <sup>2</sup> University of Freiburg, Freiburg im Breisgau, Germany heizmann@informatik.uni-freiburg.de

**Abstract.** We present a new kind of nontermination argument, called *geometric nontermination argument*. The geometric nontermination argument is a finite representation of an infinite execution that has the form of a sum of several geometric series. For so-called linear lasso programs we can decide the existence of a geometric nontermination argument using a nonlinear algebraic ∃-constraint. We show that a deterministic conjunctive loop program with nonnegative eigenvalues is nonterminating if an only if there exists a geometric nontermination argument. Furthermore, we present an evaluation that demonstrates that our method is feasible in practice.

### **1 Introduction**

The problem whether a program is terminating is undecidable in general. One way to approach this problem in practice is to analyze the existence of termination arguments and nontermination arguments. The existence of a certain termination argument like, e.g, a linear ranking function, is decidable [4,31] and implies termination. However, if we cannot find a linear ranking function we cannot conclude nontermination. Vice versa, the existence of a certain nontermination argument like, e.g, a linear recurrence set [20], is decidable and implies nontermination however, if we cannot find such a recurrence set we cannot conclude termination.

In this paper<sup>1</sup> we present a new kind of termination argument which we call *geometric nontermination argument (GNTA)*. Unlike a recurrence set, a geometric nontermination argument does not only imply nontermination, it also explicitly represents an infinite program execution. Hence a user sees immediately if the counterexample to termination is a fixpoint or an unbounded diverging execution. An infinite program execution that is represented by a geometric nontermination argument can be written as a pointwise sum of several geometric series. We show that such an infinite execution exists for each deterministic conjunctive loop program that is nonterminating and whose transition matrix has only nonnegative eigenvalues.

<sup>1</sup> An extended version of this paper [29] contains more examples and further explanations.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 266–283, 2018. https://doi.org/10.1007/978-3-319-89963-3\_16

$$\begin{array}{llll} \text{b} & := & 1; & \text{b} & := & 1; \\ \text{while (a+b>} = & 3): & \text{while (a+b>} = & 3): & \text{while (a+b>} = & 4): \\ \text{a} & := & 3 \ast \text{a} + 1; & \text{a} & := & 3 \ast \text{a} - 2; & \text{a} & := & 3 \ast \text{a} + \text{b}; \\ \text{b} & := & \text{nondet} \text{( )}; & \text{b} & := & 2 \ast \text{b}; & \text{b} & := & 2 \ast \text{b}; \\ \\ & & & \text{(a)} & & \text{(b)} & & \text{(c)} \end{array}$$

**Fig. 1.** Three nonterminating linear lasso programs. Each has an infinite execution which is either a geometric series or a pointwise sum of geometric series. The first lasso program is nondeterministic because the variable b gets some nondeterministic value in each iteration.

We restrict ourselves to linear lasso programs. A lasso program consists of a single while loop that is preceded by straight-line code. The name refers to the lasso shaped form of the control flow graph. Usually, linear lasso programs do not occur as stand-alone programs. Instead, they are used as a finite representation of an infinite path in a control flow graph. For example, in (potentially spurious) counterexamples in termination analysis [6,16,21,22,24,25,32,33,37], stability analysis [11,34], cost analysis [1,19], or the verification of temporal properties [7, 13–15,18] for programs.

We present a constraint based approach that allow us to check whether a linear conjunctive lasso program has a geometric nontermination argument and to synthesize one if it exists.

Our analysis is motived by the probably simplest form of an infinite executions, namely infinite execution where the same state is always repeated. We call such a state a fixed point. For lasso programs we can reduce the check for the existence of a fixed point to a constraint solving problem as follows. Let us assume that the stem and the loop of the lasso program are given as a formulas over primed and unprimed variables STEM(*x*, *<sup>x</sup>* ) and LOOP(*x*, *x* ). The infinite sequence *s*<sup>0</sup>, *<sup>s</sup>*¯, *<sup>s</sup>*¯, *<sup>s</sup>*¯,... is an nonterminating execution of the lasso program iff the assignment *<sup>x</sup>*<sup>0</sup> -<sup>→</sup> *s*<sup>0</sup>, *x*¯ -<sup>→</sup> *s*¯ is a satisfying assignment for the constraint STEM(*x*<sup>0</sup>, *x*¯)<sup>∧</sup> LOOP(*x*¯, *x*¯). In this paper, we present a constraint that is not only satisfiable if the program has a fixed point, it is also satisfiable if the program has a nonterminating execution that can be written as a pointwise sum of geometric series.

Let us motivate the representation of infinite executions as sums of geometric series in three steps. The program depicted in Fig. 1a shows a lasso program which does not have a fixed point but the following infinite execution.

$$\left( \begin{smallmatrix} 2\\0 \end{smallmatrix} \right), \left( \begin{smallmatrix} 2\\1 \end{smallmatrix} \right), \left( \begin{smallmatrix} 7\\1 \end{smallmatrix} \right), \left( \begin{smallmatrix} 22\\1 \end{smallmatrix} \right), \left( \begin{smallmatrix} 67\\1 \end{smallmatrix} \right), \dots$$

We can write this infinite execution as a a geometric series where for t > 1 the <sup>t</sup>-th state is the sum *<sup>x</sup>***<sup>1</sup>** <sup>+</sup> t−2 <sup>i</sup>=0 <sup>λ</sup><sup>i</sup> *<sup>y</sup>*, where we have *<sup>x</sup>***<sup>1</sup>** = ( <sup>2</sup> <sup>1</sup> ), *<sup>y</sup>* = ( <sup>5</sup> <sup>0</sup> ), and <sup>λ</sup> = 3. The state *<sup>x</sup>***<sup>1</sup>** is the state before the loop was executed before the first time and intuitively *y* is the direction in which the execution is moving initially and λ is the speed at which the execution continues to move in this direction.

Next, let us consider the lasso program depicted in Fig. 1b which has the following infinite execution.

$$\left(\begin{smallmatrix} 2\\0 \end{smallmatrix}\right), \left(\begin{smallmatrix} 2\\1 \end{smallmatrix}\right), \left(\begin{smallmatrix} 4\\4 \end{smallmatrix}\right), \left(\begin{smallmatrix} 10\\8 \end{smallmatrix}\right), \left(\begin{smallmatrix} 28\\16 \end{smallmatrix}\right), \dots$$

We cannot write this execution as a geometric series as we did above. Intuitively, the reason is that the values of both variables are increasing at different speeds and hence this execution is not moving in a single direction. However, we can write this infinite execution as a sum of geometric series where for <sup>t</sup> <sup>∈</sup> <sup>N</sup>\{0} the <sup>t</sup>-th state can be written as a sum *<sup>x</sup>***<sup>1</sup>** <sup>+</sup> t−2 <sup>i</sup>=0 <sup>Y</sup> <sup>λ</sup><sup>1</sup> <sup>0</sup> 0 λ<sup>2</sup> i **1**, where we have *<sup>x</sup>***<sup>1</sup>** = ( <sup>2</sup> <sup>1</sup> ), *<sup>Y</sup>* <sup>=</sup> 2 0 0 1 , λ<sup>1</sup> = 3, λ<sup>2</sup> = 2 and **1** denotes the column vector of ones. Intuitively, our execution is moving in two different directions at different speeds. The directions are reflected by the column vectors of Y , the values of λ<sup>1</sup> and λ<sup>2</sup> reflect the respective speeds.

Let us next consider the lasso program in Fig. 1c which has the following infinite execution.

$$\left( \begin{smallmatrix} 3\\0 \end{smallmatrix} \right), \left( \begin{smallmatrix} 3\\1 \end{smallmatrix} \right), \left( \begin{smallmatrix} 10\\2 \end{smallmatrix} \right), \left( \begin{smallmatrix} 32\\4 \end{smallmatrix} \right), \left( \begin{smallmatrix} 100\\8 \end{smallmatrix} \right), \dots$$

We cannot write this execution as a pointwise sum of geometric series in the form that we used above. Intuitively, the problem is that one of the initial directions contributes at two different speeds to the overall progress of the execution. However, we can write this infinite execution as a pointwise sum of geometric series where for <sup>t</sup> <sup>∈</sup> <sup>N</sup>\{0} the <sup>t</sup>-th state can be written as a sum *x***<sup>1</sup>**+t−2 <sup>i</sup>=0 <sup>Y</sup> <sup>λ</sup><sup>1</sup> <sup>μ</sup> 0 λ<sup>2</sup> i **1**, where we have *<sup>x</sup>***<sup>1</sup>** = ( <sup>3</sup> <sup>1</sup> ), *<sup>Y</sup>* <sup>=</sup> 4 3 0 1 , λ<sup>1</sup> = 3, λ<sup>2</sup> = 2, μ = 1 and **1** denotes the column vector of ones. We call the tuple (*x***<sup>0</sup>**, *x***<sup>1</sup>**,Y,λ1, λ2, μ) which we use as a finite representation for the infinite execution a *geometric nontermination argument*.

In this paper, we formally introduce the notion of a geometric nontermination argument for linear lasso programs (Sect. 3) and we prove that each nonterminating deterministic conjunctive linear loop program whose transition matrix has only nonnegative real eigenvalues has a geometric nontermination argument, i.e., each such nonterminating linear loop program has an infinite execution which can be written as a sum of geometric series (Sect. 4).

### **2 Preliminaries**

We denote vectors *x* with bold symbols and matrices with uppercase Latin letters. Vectors are always understood to be column vectors, **1** denotes a vector of ones, **<sup>0</sup>** denotes a vector of zeros (of the appropriate dimension), and *ei* denotes the i-th unit vector.

#### **2.1 Linear Lasso Programs**

In this work, we consider linear lasso programs, programs that consist of a program step and a single loop. We use binary relations over the program's states to define the stem and the loop transition relation. Variables are assumed to be real-valued.

We denote by *x* the vector of <sup>n</sup> variables (x1,...,xn)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> corresponding to program states, and by *x-* = (x 1,...,x <sup>n</sup>)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> the variables of the next state.

**Definition 1 (Linear Lasso Program).** *A (conjunctive)* linear lasso program L = (STEM, LOOP) *consists of two binary relations defined by formulas with the free variables x and xof the form*

$$A\left(\frac{x}{x'}\right) \le b$$

*for some matrix* <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>m</sup> *and some vector <sup>b</sup>* <sup>∈</sup> <sup>R</sup><sup>m</sup>*.*

A *linear loop program* is a linear lasso program L without stem, i.e., a linear lasso program such that the relation STEM is equivalent to true.

**Definition 2 (Deterministic Linear Lasso Program).** *A linear loop program* L *is called* deterministic *iff its loop transition* LOOP *can be written in the following form*

> (*x*, *x* ) <sup>∈</sup> LOOP ⇐⇒ <sup>G</sup>*x* <sup>≤</sup> *g* <sup>∧</sup> *x* <sup>=</sup> <sup>M</sup>*x* <sup>+</sup> *m*

*for some matrices* <sup>G</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>m</sup>*,* <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup>*, and vectors <sup>g</sup>* <sup>∈</sup> <sup>R</sup><sup>m</sup> *and <sup>m</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup>*.*

**Definition 3 (Nontermination).** *A linear lasso program* L *is* nonterminating *iff there is an infinite sequence of states x***<sup>0</sup>**, *<sup>x</sup>***<sup>1</sup>**,...*, called an* infinite execution of <sup>L</sup>*, such that* (*x***<sup>0</sup>**, *<sup>x</sup>***<sup>1</sup>**) <sup>∈</sup> STEM *and* (*xt* , *xt***+1**) <sup>∈</sup> LOOP *for all* <sup>t</sup> <sup>≥</sup> <sup>1</sup>*.*

#### **2.2 Jordan Normal Form**

Let <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup> be a real square matrix. If there is an invertible square matrix <sup>S</sup> and a diagonal matrix D such that M = SDS−<sup>1</sup>, then M is called *diagonalizable*. The column vectors of S form the basis over which M has diagonal form. In general, real matrices are not diagonalizable. However, every real square matrix M with real eigenvalues has a representation which is almost diagonal, called *Jordan normal form*. This is a matrix that is zero except for the eigenvalues on the diagonal and one superdiagonal containing ones and zeros.

Formally, a Jordan normal form is a matrix J = diag(J<sup>i</sup><sup>1</sup> (λ1),...,J<sup>i</sup>*<sup>k</sup>* (λk)) where <sup>λ</sup>1,...,λ<sup>k</sup> are the eigenvalues of <sup>M</sup> and the real square matrices <sup>J</sup>i(λ) <sup>∈</sup> R<sup>i</sup>×<sup>i</sup> are *Jordan blocks*,

$$J\_i(\lambda) := \begin{pmatrix} \lambda \ 1 \ 0 \ \dots \ 0 \ 0 \\ 0 \ \lambda \ 1 \ \dots \ 0 \ 0 \\ \vdots & \ddots & \vdots \\ 0 \ 0 \ 0 \ \dots \ \lambda \ 1 \\ 0 \ 0 \ 0 \ \dots \ 0 \ \lambda \end{pmatrix}.$$

The subspace corresponding to each distinct eigenvalue is called *generalized eigenspace* and their basis vectors *generalized eigenvectors*.

**Theorem 4 (Jordan Normal Form).** *For each real square matrix* <sup>M</sup> <sup>∈</sup> <sup>R</sup>n×<sup>n</sup> *with real eigenvalues, there is an invertible real square matrix* <sup>V</sup> <sup>∈</sup> <sup>R</sup>n×<sup>n</sup> *and a Jordan normal form* <sup>J</sup> <sup>∈</sup> <sup>R</sup>n×<sup>n</sup> *such that* <sup>M</sup> <sup>=</sup> V JV <sup>−</sup><sup>1</sup>*.*

### **3 Geometric Nontermination Arguments**

Fix a conjunctive linear lasso program <sup>L</sup> = (STEM, LOOP) and let <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>m</sup> and *b* <sup>∈</sup> <sup>R</sup><sup>m</sup> define the loop transition such that

$$(x, x') \in \text{LOop} \iff A\left(\frac{x}{x'}\right) \le b.$$

**Definition 5 (Geometric Nontermination Argument).** *A tuple* (*x***<sup>0</sup>**, *x***<sup>1</sup>***, <sup>y</sup>***<sup>1</sup>**,..., *ys* , λ1,...,λs, μ1,...,μ<sup>s</sup>−<sup>1</sup>) *is called a* geometric nontermination argument *for the linear lasso program* L = (STEM, LOOP) *iff all of the following statements hold.*

*(domain) <sup>x</sup>***<sup>0</sup>**, *<sup>x</sup>***<sup>1</sup>**, *<sup>y</sup>***<sup>1</sup>**,..., *ys* <sup>∈</sup> <sup>R</sup><sup>n</sup>*, and* <sup>λ</sup>1,...,λs, μ1,...,μ<sup>s</sup>−<sup>1</sup> <sup>≥</sup> <sup>0</sup>

*(initiation)* (*x***<sup>0</sup>**, *x***<sup>1</sup>**) <sup>∈</sup> STEM

*(point)* A *<sup>x</sup>***<sup>1</sup>** *x***1**+*s <sup>k</sup>*=1 *yk* <sup>≤</sup> *b (ray)* A *<sup>y</sup>***<sup>1</sup>** <sup>λ</sup>1*y***<sup>1</sup>** <sup>≤</sup> <sup>0</sup> *and* <sup>A</sup> *yi* <sup>λ</sup>*iyk* <sup>+</sup>μ*k*−1*yk <sup>−</sup>***<sup>1</sup>** <sup>≤</sup> <sup>0</sup> *for each* <sup>k</sup> ∈ {<sup>2</sup> ...s}*.*

*The number* <sup>s</sup> <sup>≥</sup> <sup>0</sup> *is the* size *of the geometric nontermination argument.*

The existence of a geometric nontermination argument can be checked using an SMT solver. The constraints given by (domain), (init), (point), (ray) are nonlinear algebraic constraints and the satisfiability of these constraints is decidable.

**Proposition 6 (Soundness).** *If there is a geometric nontermination argument for a linear lasso program* L*, then* L *is nonterminating.*

*Proof.* We define <sup>Y</sup> := (*y***<sup>1</sup>** ... *yk* ) as the matrix containing the vectors *yi* as columns, and we define the following matrix.

$$U := \begin{pmatrix} \lambda\_1 \ \mu\_1 \ 0 \ \dots \ 0 & 0 \\ 0 \ \lambda\_2 \ \mu\_2 \ \dots \ 0 & 0 \\ \vdots & \ddots & \vdots \\ 0 \ 0 \ 0 \ \dots \ \lambda\_{n-1} \ \mu\_{n-1} \\ 0 \ 0 \ 0 \ \dots \ 0 & \lambda\_n \end{pmatrix} \tag{1}$$

Following Definition 3 we show that the linear lasso program L has the infinite execution

$$x\_0, \quad x\_1, \quad x\_1 + Y\mathbf{1}, \quad x\_1 + Y\mathbf{1} + YU\mathbf{1}, \quad x\_1 + Y\mathbf{1} + YU\mathbf{1} + YU^2\mathbf{1}, \quad \dots \tag{2}$$

From (init) we get (*x***0**, *x***1**) <sup>∈</sup> STEM. It remains to show that

$$\left(x\_1 + \sum\_{j=0}^{t-1} YU^j \mathbf{1}, \ x\_1 + \sum\_{j=0}^t YU^j \mathbf{1}\right) \in \text{Loop for all } t \in \mathbb{N}.\tag{3}$$

According to (domain) the matrix U has only nonnegative entries, so the same holds for the matrix Z := t−1 <sup>j</sup>=0 <sup>U</sup><sup>j</sup> . Hence <sup>Z</sup>**<sup>1</sup>** has only nonnegative entries and thus Y Z**1** can be written as s <sup>k</sup>=1 <sup>α</sup><sup>k</sup>*yk* for some <sup>α</sup><sup>k</sup> <sup>≥</sup> 0. We multiply the inequality number k from (ray) with α<sup>k</sup> and get

$$A\left(\alpha\_k \lambda\_k y\_k + \alpha\_k \mu\_{k-1} y\_{k-1}\right) \le 0. \tag{4}$$

where we define for convenience *<sup>y</sup>***<sup>0</sup>** := 0 and <sup>μ</sup><sup>0</sup> := 0. Now we sum (4) for all <sup>k</sup> and add (point) to get

$$A\left(x\_1 + \sum\_{k}^{x\_1 + \sum\_{k} \alpha\_k y\_k} \right) \le \mathbf{b}.\tag{5}$$

$$A\left(x\_1 + \sum\_{k} y\_k + \sum\_{k} (\alpha\_k \lambda\_k y\_k + \alpha\_k \mu\_{k-1} y\_{k-1})\right) \le \mathbf{b}.\tag{6}$$

By definition of αk, we have

$$x\_1 + \sum\_{k=1}^{s} \alpha\_k y\_k = x\_1 + YZ \mathbf{1} \ = x\_1 + \sum\_{j=0}^{t-1} YU^j \mathbf{1}$$

and

$$\begin{aligned} x\_1 + \sum\_{k=1}^s y\_k + \sum\_{k=1}^s (\alpha\_k \lambda\_k y\_k + \alpha\_k \mu\_{k-1} y\_{k-1}) &= x\_1 + Y\mathbf{1} + \sum\_{k=1}^s \alpha\_k YU e\_k \\ &= x\_1 + Y\mathbf{1} + YUZ\mathbf{1} \\ &= x\_1 + \sum\_{j=0}^t YU^j \mathbf{1}. \end{aligned}$$

Therefore (3) and (5) are the same, which concludes this proof.

**Proposition 7 (Closed Form of the Infinite Execution).** *For* <sup>t</sup> <sup>≥</sup> <sup>2</sup> *the following is the closed form of the state xt* <sup>=</sup> *<sup>x</sup>***<sup>1</sup>** <sup>+</sup> t−2 <sup>j</sup>=0 Y U<sup>j</sup>**<sup>1</sup>** *in the infinite execution* (2)*. Let* U =: N +D *where* N *is a nilpotent matrix and* D *is a diagonal matrix.*

$$YU^j\mathbf{1} = Y\left(\sum\_{i=0}^j \binom{j}{i} N^i D^{j-i}\right) \mathbf{1} = \sum\_{k=1}^s y\_k \sum\_{i=0}^{j-k+1} \binom{j}{i} \lambda\_{n-k-i}^{j-i} \prod\_{\ell=k}^{k+i-1} \mu\_\ell \qquad \Diamond$$

#### **4 Completeness Results**

First we show that a linear loop program has a GNTA if it has is a bounded infinite execution. In the next section we use this to prove our completeness result.

#### **4.1 Bounded Infinite Executions**

Let |·| : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> denote some norm. We call an infinite execution (*x*t)t≥<sup>0</sup> *bounded* iff there is a real number <sup>d</sup> <sup>∈</sup> <sup>R</sup> such that the norm of each state is bounded by <sup>d</sup>, i.e., <sup>|</sup>*x*t| ≤ <sup>d</sup> for all <sup>t</sup> (in <sup>R</sup><sup>n</sup> the notion of boundedness is independent of the choice of the norm).

**Lemma 8 (Fixed Point).** *Let* L = (true, LOOP) *be a linear loop program. The linear loop program* L *has a bounded infinite execution if and only if there is a fixed point x*<sup>∗</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> *such that* (*x*<sup>∗</sup>, *<sup>x</sup>*<sup>∗</sup>) <sup>∈</sup> LOOP*.*

*Proof.* If there is a fixed point *x*<sup>∗</sup>, then the loop has the infinite bounded execution *<sup>x</sup>*<sup>∗</sup>, *<sup>x</sup>*<sup>∗</sup>,.... Conversely, let (*x*<sup>t</sup>)<sup>t</sup>≥<sup>0</sup> be an infinite bounded execution. Boundedness implies that there is an <sup>d</sup> <sup>∈</sup> <sup>R</sup> such that <sup>|</sup>*x*<sup>t</sup>| ≤ <sup>d</sup> for all <sup>t</sup>. Consider the sequence *<sup>z</sup>*<sup>k</sup> := <sup>1</sup> k k <sup>t</sup>=1 *<sup>x</sup>*<sup>t</sup>.

$$\begin{aligned} |z\_k - z\_{k+1}| &= \left| \frac{1}{k} \sum\_{t=1}^k x\_t - \frac{1}{k+1} \sum\_{t=1}^{k+1} x\_t \right| = \frac{1}{k(k+1)} \left| (k+1) \sum\_{t=1}^k x\_t - k \sum\_{t=1}^{k+1} x\_t \right| \\ &= \frac{1}{k(k+1)} \left| \sum\_{t=1}^k x\_t - kx\_{k+1} \right| \le \frac{1}{k(k+1)} \left( \sum\_{t=1}^k |x\_t| + k|x\_{k+1}| \right) \\ &\le \frac{1}{k(k+1)} (k \cdot d + k \cdot d) = \frac{2d}{k+1} \longrightarrow 0 \text{ as } k \to \infty. \end{aligned}$$

Hence the sequence (*z*<sup>k</sup>)<sup>k</sup>≥<sup>1</sup> is a Cauchy sequence and thus converges to some *z*<sup>∗</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>. We will show that *z*<sup>∗</sup> is the desired fixed point.

For all <sup>t</sup>, the polyhedron <sup>Q</sup> := {( *<sup>x</sup> x*- ) <sup>|</sup> <sup>A</sup> ( *<sup>x</sup> x*- ) <sup>≤</sup> <sup>b</sup>} contains ( *<sup>x</sup><sup>t</sup> <sup>x</sup>t*+1 ) and is convex. Therefore for all <sup>k</sup> <sup>≥</sup> 1,

$$\frac{1}{k}\sum\_{t=1}^{k} \left( {}^{x\_t}\_{x\_{t+1}} \right) \in Q.$$

Together with

$$\left(\begin{array}{c} z\_k\\ \frac{k+1}{k}z\_{k+1} \end{array}\right) = \frac{1}{k} \left(\begin{array}{c} \mathbf{0} \\ x\_1 \end{array}\right) + \frac{1}{k} \sum\_{t=1}^k \left(\begin{array}{c} x\_{t+1} \\ x\_{t+1} \end{array}\right).$$

we infer

$$
\left( \left( \begin{array}{c} z\_k \\ \frac{k+1}{k} z\_{k+1} \end{array} \right) - \frac{1}{k} \left( \begin{smallmatrix} \mathbf{0} \\ x\_1 \end{smallmatrix} \right) \right) \in Q,
$$

and since Q is topologically closed we have

$$\left(\begin{smallmatrix} z \\ z^\* \end{smallmatrix}\right) = \lim\_{k \to \infty} \left( \begin{pmatrix} z\_k \\ \frac{k+1}{k} z\_{k+1} \end{pmatrix} - \frac{1}{k} \begin{pmatrix} \mathbf{0} \\ x\_1 \end{pmatrix} \right) \in Q.s.$$

 

Note that Lemma 8 does not transfer to lasso programs: there might only be one fixed point and the stem might exclude this point (e.g., <sup>a</sup> <sup>=</sup> <sup>−</sup>0.5 and b = 3.5 in example Fig. 1a).

Because fixed points give rise to trivial geometric nontermination arguments, we can derive a criterion for the existence of geometric nontermination arguments from Lemma 8.

**Corollary 9 (Bounded Infinite Executions).** *If the linear loop program* L = (true, LOOP) *has a bounded infinite execution, then it has a geometric nontermination argument of size* 0*.*

*Proof.* By Lemma <sup>8</sup> there is a fixed point *x*<sup>∗</sup> such that (*x*<sup>∗</sup>, *<sup>x</sup>*<sup>∗</sup>) <sup>∈</sup> LOOP. We choose *<sup>x</sup>***<sup>0</sup>** <sup>=</sup> *<sup>x</sup>***<sup>1</sup>** <sup>=</sup> *<sup>x</sup>*<sup>∗</sup> which satisfies (point) and (ray) and thus is a geometric nontermination argument for <sup>L</sup>. 

*Example 10.* Note that according to our definition of a linear lasso program, the relation LOOP is a topologically closed set. If we allowed the formula defining LOOP to also contain strict inequalities, Lemma 8 no longer holds: the following program is nonterminating and has a bounded infinite execution, but it does not have a fixed point. However, the topological closure of the relation LOOP contains the fixed point a = 0.

$$\begin{array}{rcl} \text{while } (\text{a} > 0) : \\ \text{a := a / 2} \end{array} ; \begin{array}{rcl} \text{:} \\ \text{2} \end{array} ;$$

Nevertheless, this example has a geometric nontermination argument, namely *<sup>x</sup>***<sup>1</sup>** = 1, *<sup>y</sup>***<sup>1</sup>** <sup>=</sup> <sup>−</sup>0.5, <sup>λ</sup><sup>1</sup> = 0.5. ♦

#### **4.2 Nonnegative Eigenvalues**

This section is dedicated to the proof of the following completeness result for deterministic linear loop programs.

**Theorem 11 (Completeness).** *If a deterministic linear loop program* L *of the form while (*G*x* <sup>≤</sup> *g) do x* := <sup>M</sup>*x* <sup>+</sup> *m with* <sup>n</sup> *variables is nonterminating and* M *has only nonnegative real eigenvalues, then there is a geometric nontermination argument for* L *of size at most* n*.*

To prove this completeness theorem, we need to construct a GNTA from a given infinite execution. The following lemma shows that we can restrict our construction to exclude all linear subspaces that have a bounded execution.

**Lemma 12 (Loop Disassembly).** *Let* L = (true, LOOP) *be a linear loop program over* <sup>R</sup><sup>n</sup> <sup>=</sup> U⊕V *where* <sup>U</sup> *and* <sup>V</sup> *are linear subspaces of* <sup>R</sup><sup>n</sup>*. Suppose* <sup>L</sup> *is nonterminating and there is an infinite execution that is bounded when projected to the subspace* <sup>U</sup>*. Let x*<sup>U</sup> *be the fixed point in* <sup>U</sup> *that exists according to Lemma 8. Then the linear loop program* L<sup>V</sup> *that we get by projecting to the subspace* <sup>V</sup> <sup>+</sup>*x*<sup>U</sup> *is nonterminating. Moreover, if* <sup>L</sup><sup>V</sup> *has a GNTA of size* <sup>s</sup>*, then* L *has a GNTA of size* s*.*

*Proof.* Without loss of generality, we are in the basis of U and V so that these spaces are nicely separated by the use of different variables. Using the infinite execution of <sup>L</sup> that is bounded on <sup>U</sup> we can do the construction from the proof of Lemma <sup>8</sup> to get an infinite execution *z***0**, *<sup>z</sup>***1**,... that yields the fixed point *<sup>x</sup>*<sup>U</sup> when projected to <sup>U</sup>. We fix *x*<sup>U</sup> in the loop transition by replacing all variables from <sup>U</sup> with the values from *x*<sup>U</sup> and get the linear loop program <sup>L</sup><sup>V</sup> (this is the projection to <sup>V</sup> <sup>+</sup>*x*<sup>U</sup> ). Importantly, the projection of *<sup>z</sup>***0**, *<sup>z</sup>***1**,... to <sup>V</sup> <sup>+</sup>*x*<sup>U</sup> is still an infinite execution, hence the loop L<sup>V</sup> is nonterminating. Given a GNTA for <sup>L</sup><sup>V</sup> we can construct a GNTA for <sup>L</sup> by adding the vector *<sup>x</sup>*<sup>U</sup> to *<sup>x</sup>***<sup>0</sup>** and *<sup>x</sup>***<sup>1</sup>**. 

*Proof (of Theorem* 11*).* The polyhedron corresponding to loop transition of the deterministic linear loop program L is

$$
\begin{pmatrix} G & 0 \\ M & -I \\ -M & I \end{pmatrix} \begin{pmatrix} x \\ x' \end{pmatrix} \le \begin{pmatrix} g \\ -m \\ m \end{pmatrix} . \tag{6}
$$

Define Y to be the convex cone spanned by the rays of the guard polyhedron:

$$\mathcal{V} := \{ \mathbf{y} \in \mathbb{R}^n \mid G\mathbf{y} \le 0 \}$$

Let <sup>Y</sup> be the smallest linear subspace of <sup>R</sup><sup>n</sup> that contains <sup>Y</sup>, i.e., <sup>Y</sup> <sup>=</sup> Y−Y using pointwise subtraction, and let <sup>Y</sup><sup>⊥</sup> be the linear subspace of <sup>R</sup><sup>n</sup> orthogonal to <sup>Y</sup>; hence <sup>R</sup><sup>n</sup> <sup>=</sup> Y ⊕ <sup>Y</sup><sup>⊥</sup>.

Let <sup>P</sup> := {*x* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>G</sup>*<sup>x</sup>* <sup>≤</sup> *<sup>g</sup>*} denote the guard polyhedron. Its projection P <sup>Y</sup><sup>⊥</sup> to the subspace <sup>Y</sup><sup>⊥</sup> is again a polyhedron. By the decomposition theorem for polyhedra [36, Corollary 7.1b], P <sup>Y</sup><sup>⊥</sup> = Q + C for some polytope Q and some convex cone <sup>C</sup>. However, by definition of the subspace <sup>Y</sup><sup>⊥</sup>, the convex cone <sup>C</sup> must be equal to {**0**}: for any *<sup>y</sup>* <sup>∈</sup> <sup>C</sup> <sup>⊆</sup> <sup>Y</sup><sup>⊥</sup>, we have <sup>G</sup>*y* <sup>≤</sup> **<sup>0</sup>**, thus *y* ∈ Y, and therefore *y* is orthogonal to itself, i.e., *y* <sup>=</sup> **<sup>0</sup>**. We conclude that <sup>P</sup> <sup>Y</sup><sup>⊥</sup> must be a polytope, and thus it is bounded. By assumption L is nonterminating, so LY<sup>⊥</sup> is nonterminating, and since P <sup>Y</sup><sup>⊥</sup> is bounded, any infinite execution of LY<sup>⊥</sup> must be bounded.

Let U denote the direct sum of the generalized eigenspaces for the eigenvalues <sup>0</sup> <sup>≤</sup> λ < 1. Any infinite execution is necessarily bounded on the subspace <sup>U</sup> since on this space the map *x* -<sup>→</sup> <sup>M</sup>*x*+*m* is a contraction. Let <sup>U</sup><sup>⊥</sup> denote the subspace of <sup>R</sup><sup>n</sup> orthogonal to <sup>U</sup>. The space Y∩U<sup>⊥</sup> is a linear subspace of <sup>R</sup><sup>n</sup> and any infinite execution in its complement is bounded. Hence we can turn our analysis to the subspace Y∩U<sup>⊥</sup> <sup>+</sup> *<sup>x</sup>* for some *<sup>x</sup>* <sup>∈</sup> <sup>Y</sup><sup>⊥</sup> ⊕ U for the rest of the proof according to Lemma 12. From now on, we implicitly assume that we are in this space without changing any of the notation.

*Part 1.* In this part we show that there is a basis *<sup>y</sup>***<sup>1</sup>**,..., *ys* ∈ Y such that <sup>M</sup> turns into a matrix <sup>U</sup> of the form given in (1) with <sup>λ</sup>1,...,λs, μ1,...,μ<sup>s</sup>−<sup>1</sup> <sup>≥</sup> 0. Since we allow μ<sup>k</sup> to be positive between different eigenvalues (Example 14 illustrates why), this is not necessarily a Jordan normal form and the vectors *yi* are not necessarily generalized eigenvectors.

We choose a basis *<sup>v</sup>***1**,..., *vs* such that <sup>M</sup> is in Jordan normal form with the eigenvalues ordered by size such that the largest eigenvalues come first. Define <sup>V</sup><sup>1</sup> := Y∩U<sup>⊥</sup> and let <sup>V</sup><sup>1</sup> <sup>⊃</sup> ... ⊃ V<sup>s</sup> be a strictly descending chain of linear subspaces where <sup>V</sup><sup>i</sup> is spanned by *vk* ,..., *vs* .

We define a basis *<sup>w</sup>***1**,..., *ws* by doing the following for each Jordan block of M, starting with k = 1. Let M(k) be the projection of M to the linear subspace <sup>V</sup><sup>k</sup> and let <sup>λ</sup> be the largest eigenvalues of <sup>M</sup>(k) . The m-fold iteration of a Jordan block <sup>J</sup>(λ) for <sup>m</sup> <sup>≥</sup> is given by

$$J\_{\ell}(\lambda)^{m} = \begin{pmatrix} \lambda^{m} \begin{pmatrix} \binom{m}{1} \lambda^{m-1} \dots \ & \binom{m}{\ell} \lambda^{m-\ell} \\ \lambda^{m} & \dots \ \binom{m}{\ell-1} \lambda^{m-\ell+1} \\ & \ddots & \vdots \\ 0 & & \lambda^{m} \end{pmatrix} \in \mathbb{R}^{\ell \times \ell}. \tag{7}$$

Let *<sup>z</sup>***<sup>0</sup>**, *<sup>z</sup>***<sup>1</sup>**, *<sup>z</sup>***<sup>2</sup>**,... be an infinite execution of the loop <sup>L</sup> in the basis *vk* ,..., *vs* projected to the space Vk. Since by Lemma 12 we can assume that there are no fixed points on this space, <sup>|</sup>*zt* |→∞ as <sup>t</sup> → ∞ in each of the top components. Asymptotically, the largest eigenvalue λ dominates and in each row of Jk(λk)<sup>m</sup> (7), the entries <sup>m</sup> j λ<sup>m</sup>−<sup>j</sup> in the rightmost column grow the fastest with an asymptotic rate of Θ(m<sup>j</sup> exp(m)). Therefore the sign of the component corresponding to basis vector *vk***<sup>+</sup>** determines whether the top entries tend to <sup>+</sup><sup>∞</sup> or −∞, but the top entries of *zt* corresponding to the top Jordan block will all have the same sign eventually. Because no state can violate the guard condition we have that the guard cannot constraint the infinite execution in the direction of *vj* or <sup>−</sup>*vj* , i.e., <sup>G</sup><sup>V</sup>*<sup>k</sup> vj* <sup>≤</sup> **<sup>0</sup>** for each <sup>j</sup> ∈ {k,..., k <sup>+</sup> } or <sup>G</sup><sup>V</sup>*<sup>k</sup> vj* <sup>≥</sup> **<sup>0</sup>** for each <sup>j</sup> ∈ {k,..., k <sup>+</sup>}, where <sup>G</sup><sup>V</sup>*<sup>k</sup>* is the projection of <sup>G</sup> to the subspace <sup>V</sup>k. So without loss of generality the former holds (otherwise we use <sup>−</sup>*vj* instead of *vj* for <sup>j</sup> ∈ {k,..., k <sup>+</sup> }) and for <sup>j</sup> ∈ {k,..., k <sup>+</sup> } we get *vj* ∈ Y <sup>+</sup> <sup>V</sup><sup>⊥</sup> <sup>k</sup> where V<sup>⊥</sup> <sup>k</sup> is the space spanned by *<sup>v</sup>***<sup>1</sup>**,..., *vk<sup>−</sup>***<sup>1</sup>**. Hence there is a *uj* ∈ V<sup>⊥</sup> <sup>k</sup> such that *wj* := *vj* <sup>+</sup> *uj* is an element of <sup>Y</sup>. Now we move on to the subspace <sup>V</sup>k++1, discarding the top Jordan block.

Let <sup>T</sup> be the matrix <sup>M</sup> written in the basis *<sup>w</sup>***<sup>1</sup>**,..., *wk* . Then <sup>T</sup> is of upper triangular form: whenever we apply <sup>M</sup>*wk* we get <sup>λ</sup><sup>k</sup>*wk* <sup>+</sup> *uk* (*wk* was an eigenvector in the space <sup>V</sup>k) where *uk* ∈ V<sup>⊥</sup> <sup>k</sup> , the space spanned by *<sup>v</sup>***<sup>1</sup>**,..., *vk<sup>−</sup>***<sup>1</sup>** (which is identical with the space spanned by *w***<sup>1</sup>**,..., *wk<sup>−</sup>***<sup>1</sup>**). Moreover, since we processed every Jordan block entirely, we have that for *wk* and *wj* from the same generalized eigenspace (Tk,k = Tj,j ) that for k>j

$$T\_{j,k} \in \{0, 1\} \text{ and } T\_{j,k} = 1 \text{ implies } k = j + 1. \tag{8}$$

In other words, when projected to any generalized eigenspace T consists only of Jordan blocks.

Now we change basis again in order to get the upper triangular matrix U defined in (1) from T. For this we define the vectors

$$y\_k := \beta\_k \sum\_{j=1}^k \alpha\_{k,j} w\_j.$$

with nonnegative real numbers <sup>α</sup>k,j <sup>≥</sup> 0, <sup>α</sup>k,k <sup>&</sup>gt; 0, and *<sup>β</sup>* <sup>&</sup>gt; 0 to be determined later. Define the matrices <sup>W</sup> := (*w***<sup>1</sup>** ... *ws* ), <sup>Y</sup> := (*y***<sup>1</sup>** ... *ys* ), and <sup>α</sup> := (αk,j )1≤j≤k≤<sup>s</sup>. So <sup>α</sup> is a nonnegative lower triangular matrix with a positive diagonal and hence invertible. Since α and W are invertible, the matrix <sup>Y</sup> = diag(*β*)αW is invertible as well and thus the vectors *<sup>y</sup>***<sup>1</sup>**,..., *ys* form a basis. Moreover, we have *yk* ∈ Y for each <sup>k</sup> since <sup>α</sup> <sup>≥</sup> 0, *<sup>β</sup>* <sup>&</sup>gt; 0, and <sup>Y</sup> is a convex cone. Therefore we get

$$GY \le 0.\tag{9}$$

We will first choose α. Define T =: D + N where D = diag(λ1,...,λs) is a diagonal matrix and <sup>N</sup> is nilpotent. Since *<sup>w</sup>***<sup>1</sup>** is an eigenvector of <sup>M</sup> we have <sup>M</sup>*y***<sup>1</sup>** <sup>=</sup> <sup>M</sup>*β*1α1,1*w***<sup>1</sup>** <sup>=</sup> <sup>λ</sup>1*β*1α1,1*w***<sup>1</sup>** <sup>=</sup> <sup>λ</sup>1*y***<sup>1</sup>**. To get the form in (1), we need for all k > 1

$$My\_k = \lambda\_k y\_k + \mu\_{k-1} y\_{k-1}.\tag{10}$$

$$\dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots$$

Written in the basis *<sup>w</sup>***<sup>1</sup>**,..., *ws* (i.e., multiplied with <sup>W</sup>−<sup>1</sup>),

$$((D+N)\beta\_k \sum\_{j\le k} \alpha\_{k,j} \mathbf{e}\_j = \lambda\_k \beta\_k \sum\_{j\le k} \alpha\_{k,j} \mathbf{e}\_j + \mu\_{k-1} \beta\_{k-1} \sum\_{j$$

Hence we want to pick α such that

$$\sum\_{j \le k} \alpha\_{k,j} (\lambda\_j - \lambda\_k) \mathbf{e}\_j + N \sum\_{j \le k} \alpha\_{k,j} \mathbf{e}\_j - \mu\_{k-1} \beta\_{k-1} \sum\_{j < k} \alpha\_{k-1,j} \mathbf{e}\_j = \mathbf{0}. \tag{11}$$

First note that these constraints are independent of *<sup>β</sup>* if we set <sup>μ</sup><sup>k</sup>−<sup>1</sup> := *<sup>β</sup>*<sup>−</sup><sup>1</sup> <sup>k</sup>−<sup>1</sup> <sup>&</sup>gt; 0, so we can leave assigning a value to *β* to a later part of the proof.

We distinguish two cases. First, if <sup>λ</sup><sup>k</sup>−<sup>1</sup> <sup>=</sup> <sup>λ</sup>k, then <sup>λ</sup><sup>j</sup> <sup>−</sup> <sup>λ</sup><sup>k</sup> is positive for all j<k because larger eigenvalues come first. Since N is nilpotent and upper triangular, N - <sup>j</sup>≤<sup>k</sup> <sup>α</sup>k,j*ej* is a linear combination of *<sup>e</sup>***<sup>1</sup>**,..., *ek<sup>−</sup>***<sup>1</sup>** (i.e., only the first <sup>k</sup> <sup>−</sup> 1 entries are nonzero). Whatever values this vector assumes, we can increase the parameters αk,j for j<k to make (11) larger and increase the parameters <sup>α</sup><sup>k</sup>−1,j for j<k to make (11) smaller.

Second, let be minimal such that <sup>λ</sup> <sup>=</sup> <sup>λ</sup><sup>k</sup> wkth <sup>=</sup> <sup>k</sup>, then *w-* ,..., *wj* are from the same generalized eigenspace. For the rows 1,..., <sup>−</sup> 1 we can proceed as we did in the first case and for the rows , . . . , k <sup>−</sup> 1 we note that by (8) <sup>N</sup>*ej* <sup>=</sup> <sup>T</sup><sup>j</sup>−1,j*ej<sup>−</sup>***<sup>1</sup>**. Hence the remaining constraints (11) are

$$\sum\_{\ell < j \le k} \alpha\_{k,j} T\_{j-1,j} \mathbf{e}\_{j-1} - \mu\_{k-1} \sum\_{\ell \le j < k} \alpha\_{k-1,j} \mathbf{e}\_j = \mathbf{0},$$

which is solved by <sup>α</sup>k,j+1Tj,j+1 <sup>=</sup> <sup>α</sup><sup>k</sup>−1,j for <sup>≤</sup> j<k. This is only a problem if there is a <sup>j</sup> such that <sup>T</sup><sup>j</sup>−1,j = 0, i.e., if there are multiple Jordan blocks for the same eigenvalue. In this case, we can reduce the dimension of the generalized eigenspace to the dimension of the largest Jordan block by combining all Jordan blocks: if <sup>M</sup>*yk* <sup>=</sup> <sup>λ</sup>*yk* <sup>+</sup> *yk−***1**, and <sup>M</sup>*yj* <sup>=</sup> <sup>λ</sup>*yj* <sup>+</sup> *yj−***1**, then <sup>M</sup>(*yk* <sup>+</sup> *yj* ) = <sup>λ</sup>(*yk* <sup>+</sup> *yj* )+(*yk−***<sup>1</sup>** <sup>+</sup> *yj−***1**) and if <sup>M</sup>*yk* <sup>=</sup> <sup>λ</sup>*yk* <sup>+</sup> *yk−***1**, and <sup>M</sup>*yj* <sup>=</sup> <sup>λ</sup>*yj* , then <sup>M</sup>(*yk* <sup>+</sup> *yj* ) = <sup>λ</sup>(*yk* <sup>+</sup> *yj* ) + *yk−***1**. In both cases we can replace the basis vector *yk* with *yk* <sup>+</sup> *yj* without reducing the expressiveness of the GNTA.

Importantly, there are no cyclic dependencies in the values of α because neither one of the coefficients α can be made too large. Therefore we can choose <sup>α</sup> <sup>≥</sup> 0 such that (10) is satisfied for all k > 1 and hence the basis *<sup>y</sup>***<sup>1</sup>**,..., *ys* brings M into the desired form (1).

*Part 2.* In this part we construct the geometric nontermination argument and check the constraints from Definition 5. Since L has an infinite execution, there is a point *<sup>x</sup>* that fulfills the guard, i.e., <sup>G</sup>*<sup>x</sup>* <sup>≤</sup> *<sup>g</sup>*. We choose *<sup>x</sup>***<sup>1</sup>** := *<sup>x</sup>* <sup>+</sup> <sup>Y</sup> *<sup>γ</sup>* with *<sup>γ</sup>* <sup>≥</sup> **<sup>0</sup>** to be determined later. Moreover, we choose <sup>λ</sup>1,...,λ<sup>s</sup> and <sup>μ</sup>1,...,μ<sup>s</sup>−<sup>1</sup> from the entries of U given in (1). The size of our GNTA is s, the number of vectors *<sup>y</sup>***<sup>1</sup>**,..., *ys* . These vectors form a basis of Y∩U<sup>⊥</sup>, which is a subspace of <sup>R</sup><sup>n</sup>; thus <sup>s</sup> <sup>≤</sup> <sup>n</sup>, as required.

The constraint (domain) is satisfied by construction and the constraint (init) is vacuous since L is a loop program. For (ray) note that from (9) and (10) we get

$$
\begin{pmatrix} G & 0 \\ M & -I \\ -M & I \end{pmatrix} \begin{pmatrix} y\_k \\ \lambda\_k y\_k + \mu\_{k-1} y\_{k-1} \\ \end{pmatrix} \le \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}.
$$

The remainder of this proof shows that we can choose *β* and *γ* such that (point) is satisfied, i.e., that

$$Gx\_1 \le g \text{ and } Mx\_1 + m = x\_1 + Y1. \tag{12}$$

The vector *<sup>x</sup>***<sup>1</sup>** satisfies the guard since <sup>G</sup>*x***<sup>1</sup>** <sup>=</sup> <sup>G</sup>*<sup>x</sup>* <sup>+</sup> GY *<sup>γ</sup>* <sup>≤</sup> *<sup>g</sup>* <sup>+</sup> **<sup>0</sup>** according to (9), which yields the first part of (12). For the second part we observe the following.

$$\begin{aligned} Mx\_1 + m &= x\_1 + Y\mathbf{1} \\ \iff &(M - I)(x + Y\gamma) + m = Y\mathbf{1} \\ \iff &(M - I)x + m = Y\mathbf{1} - (M - I)Y\gamma \end{aligned}$$

Since Y is a basis, it is invertible, so

$$\begin{aligned} \iff & Y^{-1}(M-I)\mathfrak{x} + Y^{-1}\mathfrak{m} = \mathbf{1} - Y^{-1}(M-I)Y\gamma\\ \iff & (U-I)Y^{-1}\mathfrak{x} + Y^{-1}\mathfrak{m} = \mathbf{1} - (U-I)\gamma\\ \iff & (U-I)\tilde{\mathfrak{x}} + \tilde{\mathfrak{m}} = \mathbf{1} - (U-I)\gamma \end{aligned} \tag{13}$$

with *x***˜** := <sup>Y</sup> <sup>−</sup><sup>1</sup>*x* <sup>=</sup> <sup>W</sup>−<sup>1</sup>α−<sup>1</sup>diag(*β*)−<sup>1</sup>*x* and *m***˜** := <sup>Y</sup> <sup>−</sup><sup>1</sup>*m* <sup>=</sup> <sup>W</sup>−<sup>1</sup>α−<sup>1</sup>diag(*β*)−<sup>1</sup>*m*. Equation (13) is now conveniently in the basis *<sup>y</sup>***1**,..., *ys* and all that remains to show is that we can choose *γ* <sup>≥</sup> **<sup>0</sup>** and *β* <sup>&</sup>gt; 0 such that (13) is satisfied.

We proceed for each (not quite Jordan) block of U separately, i.e., we assume that we are looking at the subspace *yj* ,..., *yk* with <sup>μ</sup><sup>k</sup> <sup>=</sup> <sup>μ</sup>j−<sup>1</sup> = 0 and <sup>μ</sup> <sup>&</sup>gt; 0 for all ∈ {j, . . . , k−1}. If this space only contains eigenvalues that are larger than 1, then <sup>U</sup> <sup>−</sup><sup>I</sup> is invertible and has only nonnegative entries. By using large enough values for *β*, we can make *x***˜** and *m***˜** small enough, such that **<sup>1</sup>** <sup>≥</sup> (<sup>U</sup> <sup>−</sup> <sup>I</sup>)*x***˜** <sup>+</sup> *m***˜** . Then we just need to pick *γ* appropriately.

If there is at least one eigenvalue 1, then <sup>U</sup> <sup>−</sup> <sup>I</sup> is not invertible, so (13) could be overconstraint. Notice that <sup>μ</sup> <sup>&</sup>gt; 0 for all ∈ {j, . . . , k <sup>−</sup> <sup>1</sup>}, so only the bottom entry in the vector Eq. (13) is not covered by *γ*. Moreover, since eigenvalues are ordered in decreasing order and all eigenvalues in our current subspace are ≥ 1, we conclude that the eigenvalue for the bottom entry is 1. (Furthermore, k is the highest index since each eigenvalue occurs only in one block). Thus we get the equation *<sup>m</sup>***˜** <sup>k</sup> = 1. If *<sup>m</sup>***˜** <sup>k</sup> is positive, this equation has a solution since we can adjust *<sup>β</sup>*<sup>k</sup> accordingly. If it is zero, then the execution on the space spanned by *yk* is bounded, which we can rule out by Lemma 12.

It remains to rule out that *<sup>m</sup>***˜** <sup>k</sup> is negative. Let <sup>U</sup> be the generalized eigenspace to the eigenvector 1 and use Lemma <sup>13</sup> below to conclude that *o* := <sup>N</sup><sup>s</sup>−1*m*+*<sup>u</sup>* <sup>∈</sup> <sup>Y</sup> for some *u* ∈ U<sup>⊥</sup>. We have that <sup>M</sup>*<sup>o</sup>* <sup>=</sup> <sup>M</sup>(N<sup>s</sup>−1*<sup>m</sup>* <sup>+</sup> *<sup>u</sup>*) = <sup>M</sup>*<sup>u</sup>* ∈ U<sup>⊥</sup>, so *<sup>o</sup>* is a candidate to pick for the vector *wk* . Therefore without loss of generality we did so in part 1 of this proof and since *yk* is in the convex cone spanned by the basis *<sup>w</sup>***<sup>1</sup>**,..., *ws* we get *<sup>m</sup>***˜** <sup>k</sup> <sup>&</sup>gt; 0. 

**Lemma 13 (Deterministic Loops with Eigenvalue 1).** *Let* M = I + N *and let* <sup>N</sup> *be nilpotent with nilpotence index* <sup>k</sup> *(*<sup>k</sup> := min{<sup>i</sup> <sup>|</sup> <sup>N</sup><sup>i</sup> = 0}*). If* GN<sup>k</sup>−1*m* ≤ **<sup>0</sup>***, then* <sup>L</sup> *is terminating.*

*Proof.* We show termination by providing an k-nested ranking function [28, Definition 4.7]. By [28, Lemma 3.3] and [28, Theorem 4.10], this implies that L is terminating.

According to the premise, GN<sup>k</sup>−<sup>1</sup>*m* ≤ 0, hence there is at least one positive entry in the vector GN<sup>k</sup>−<sup>1</sup>*m*. Let *<sup>h</sup>* be a row vector of <sup>G</sup> such that *<sup>h</sup>*<sup>T</sup> <sup>N</sup><sup>k</sup>−<sup>1</sup>*<sup>m</sup>* =: δ > 0, and let <sup>h</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup> be the corresponding entry in *<sup>g</sup>*. Let *<sup>x</sup>* be any state and let *x* be a next state after the loop transition, i.e., *x-* <sup>=</sup> <sup>M</sup>*x* <sup>+</sup> *m*. Define the affine-linear functions <sup>f</sup><sup>j</sup> (*x*) := <sup>−</sup>*h*<sup>T</sup> <sup>N</sup><sup>k</sup>−<sup>j</sup>*<sup>x</sup>* <sup>+</sup> <sup>c</sup><sup>j</sup> for 1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>k</sup> with constants <sup>c</sup><sup>j</sup> <sup>∈</sup> <sup>R</sup> to be determined later. Since every state *<sup>x</sup>* satisfies the guard we have *<sup>h</sup>*<sup>T</sup> *<sup>x</sup>* <sup>≤</sup> <sup>h</sup>0, hence <sup>f</sup>k(*x*) = <sup>−</sup>*h*<sup>T</sup> *<sup>x</sup>* <sup>+</sup> <sup>c</sup><sup>k</sup> ≥ −h<sup>0</sup> <sup>+</sup> <sup>c</sup><sup>k</sup> <sup>&</sup>gt; 0 for <sup>c</sup><sup>k</sup> := <sup>h</sup><sup>0</sup> + 1.

$$\begin{aligned} f\_1(x') = f\_1(x + Nx + m) &= -h^T N^{k-1} (x + Nx + m) + c\_1 \\ &= f\_1(x) - h^T N^k x - h^T N^{k-1} m \\ &< f\_1(x) - 0 - \delta \end{aligned}$$

For 1 < j <sup>≤</sup> <sup>k</sup>,

$$\begin{aligned} f\_j(x') = f\_j(x + Nx + m) &= -h^T N^{k-j} (x + Nx + m) + c\_j \\ &= f\_j(x) + f\_{j-1}(x) - h^T N^{k-j} m - c\_{j-1} \\ &< f\_j(x) + f\_{j-1}(x) \end{aligned}$$

for <sup>c</sup>j−<sup>1</sup> := <sup>−</sup>*h*<sup>T</sup> <sup>N</sup>k−j*<sup>m</sup>* <sup>−</sup> 1.

*Example 14 (*U *is not in Jordan Form).* The matrix U defined in (1) and used in the completeness proof is generally *not* the Jordan normal form of the loop's transition matrix M. Consider the following linear loop program.

$$\begin{array}{rcl} \text{while} & (a - b \ge 0 \land b \ge 0) : \\ a & := \ 3a ; \\ b & := \ b + 1 ; \end{array}$$

This program is nonterminating because a grows exponentially and hence faster than b. It has the geometric nontermination argument

$$x\_{\mathbf{0}} = \begin{pmatrix} \frac{9}{1} \end{pmatrix}, \quad x\_{\mathbf{1}} = \begin{pmatrix} \frac{9}{1} \end{pmatrix}, \quad y\_{\mathbf{1}} = \begin{pmatrix} \frac{12}{0} \end{pmatrix}, \quad y\_{\mathbf{2}} = \begin{pmatrix} \frac{6}{1} \end{pmatrix}, \quad \lambda\_1 = 3, \quad \lambda\_2 = 1, \quad \mu\_1 = 1.$$

The matrix corresponding to the linear loop update is

$$M = \begin{pmatrix} 3 \ 0 \\ 0 \ 1 \end{pmatrix}$$

which is diagonal (hence diagonalizable). Therefore M is already in Jordan normal form. The matrix U defined according to (1) is

$$U = \begin{pmatrix} 3 \ 1 \\ 0 \ 1 \end{pmatrix}.$$

The nilpotent component μ<sup>1</sup> = 1 is important and there is no GTNA for this loop program where μ<sup>1</sup> = 0 since the eigenspace to the eigenvalue 1 is spanned by (0 1)<sup>T</sup> which is in <sup>Y</sup>, but not in <sup>Y</sup>. ♦

#### **5 Experiments**

We implemented our method in a tool that is specialized for the analysis of lasso programs and called Ultimate LassoRanker<sup>2</sup>. LassoRanker is used by Ultimate Buchi Automizer ¨ [22] which analyzes termination of (general) C programs. Buchi Automizer ¨ iteratively picks lasso shaped paths in the control flow graph converts them to lasso programs and lets LassoRanker analyze them. In case LassoRanker was able to prove nontermination a real counterexample to termination was found, in case LassoRanker was able to provide a

<sup>2</sup> http://ultimate.informatik.uni-freiburg.de/lasso ranker/.

termination argument (e.g., a linear ranking function), B¨uchi Automizer continues the analysis, but only on lasso shaped paths for which the termination arguments obtained in former iterations are not applicable.

We applied Buchi Automizer ¨ to the 803 C programs from the Termination Competition 2017<sup>3</sup> Our constraints for the existence of a geometric nontermination arguments (GNTA) were stated over the integers and we used the SMT solver Z3 [23] with a timeout of 12 s to solve these constraints. The overall timeout for the termination analysis was 60s. In our implementation, LassoRanker first tries to find a fixpoint for a lasso and only if not fixpoint exists, it tries to find a GNTA that can also represent an unbounded execution. The tool was able to identify 143 nonterminating programs. For 82 of these a fixpoint was detected. For the other 61 programs the counterexample had only an unbounded execution but not fixpoint.

This experiment demonstrates that despite the nonlinear integer constraint the synthesis of GNTA is feasible in practice and that furthermore GNTAs which can also represent unbounded executions improved Buchi Automizer ¨ significantly.

#### **6 Related Work**

One line of related work is focused on decidability questions for deterministic lasso programs. Tiwari [38] considered linear loop programs over the reals where only strict inequalities are used in the guard and proved that termination is decidable. Braverman [5] generalized this result to loop programs that use strict and non-strict inequalities in the guard. Furthermore, he proved that termination is also decidable for homogeneous deterministic loop programs over the integers. Rebiha et al. [35] generalized the result to integer loops where the update matrix has only real eigenvalues. Ouaknine et al. [30] generalized the result to integer lassos where the update matrix of the loop is diagonalizable.

Another line of related work is also applicable to nondeterministic programs and uses a constraint-based synthesis of recurrence sets. The recurrence sets are defined by templates [20,39] or the constraint is given in a second order theory for bit vectors [17]. These approaches can be used to find nonterminating lassos that do not have a geometric nontermination argument; however, this comes at the price that for nondeterministic programs an ∃∀∃-constraint has to be solved.

Furthermore, there is a long line of research [2,3,8–10,12,17,26,27] that addresses programs that are more general than lasso programs.

#### **7 Conclusion**

We presented a new approach to nontermination analysis for (nondeterministic) linear lasso programs. This approach is based on geometric nontermination arguments, which are an explicit representation of an infinite execution. Unlike,

<sup>3</sup> http://termination-portal.org/wiki/Termination Competition 2017.

e.g., a recurrence set which encodes a set of nonterminating executions, a user can immediate see if our nonterminating proof encodes a fixpoint or a diverging unbounded execution. Our nontermination arguments can be found by solving a set of nonlinear constraints. In Sect. 4 we showed that the class of nonterminating linear lasso programs that have a geometric nontermination argument is quite large: it contains at least every deterministic linear loop program whose eigenvalues are nonnegative. We expect that this statement can be extended to encompass also negative and complex eigenvalues.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Hybrid and Stochastic Systems

# **Efficient Dynamic Error Reduction for Hybrid Systems Reachability Analysis**

Stefan Schupp(B) and Erika Abrah´ ´ am

RWTH Aachen University, Aachen, Germany stefan.schupp@cs.rwth-aachen.de

**Abstract.** To decide whether a set of states is reachable in a hybrid system, over-approximative symbolic successor computations can be used, where the symbolic representation of state sets as well as the successor computations have several parameters which determine the efficiency and the precision of the computations. Naturally, faster computations come with less precision and more spurious counterexamples. To remove a spurious counterexample, the only possibility offered by current tools is to reduce the error by re-starting the complete search with different parameters. In this paper we propose a CEGAR approach that takes as input a user-defined ordered list of search configurations, which are used to dynamically refine the search tree along potentially spurious counterexamples. Dedicated datastructures allow to extract as much useful information as possible from previous computations in order to reduce the refinement overhead.

### **1 Introduction**

As the correct behavior of *hybrid systems* with mixed discrete-continuous behavior is often safety critical, a lot of effort was put into the development and implementation of techniques for their analysis. In this paper we focus on techniques for proving unreachability of a given set of unsafe states. Besides methods based on theorem proving [11,21,25], logical encoding [13,15,22,26] and validated simulation [12,28], *flowpipe-construction-based methods* [2,7,9,17–20,27] show increasing performance and usability. These methods over-approximate the set of states that are reachable in a hybrid system from a given set of initial states by executing an iterative forward reachability analysis algorithm. The result is a sequence of state sets whose union contains all system paths starting in any initial state (usually for bounded time duration and a bounded number of discrete steps, unless a fixedpoint could be detected).

If the resulting over-approximation does not intersect with the unsafe state set then the verification task is successfully completed. However, if the intersection is not empty, due to the over-approximation the results are not conclusive. In this case the only possibility for achieving a conclusive answer is to change

This work was supported by the German research council (DFG) in the context of the HyPro project and the DFG Research Training Group 2236 UnRAVeL.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 287–302, 2018. https://doi.org/10.1007/978-3-319-89963-3\_17

some analysis parameters to reduce the approximation error. As a smaller error typically comes with a higher computational effort, the choice of suitable parameters by the user can be a tedious task.

Most tools do not support the dynamic change of those parameters, thus after the modification of the parameters the user has to re-start the whole computation. One of the few tools implementing some hard-coded dynamic parameter adaptations is the STC mode [16] of SpaceEx [17], which dynamically adapts the time-step size during reachability analysis to detect the enabledness of discrete events more precisely. Another parameter (the degree of Taylor approximations) is dynamically adapted in the Flow<sup>∗</sup> tool [9]. The method [5], also implemented in SpaceEx, uses cheap (but stronger over-approximating) computations to detect potentially unsafe paths and use this information to guide more precise (and more time-consuming) computations. In [6] the authors present a method to automatically derive template directions when using template polyhedra as a state set representation in a CEGAR refinement fashion during analysis. As a last example, in [24] the authors use model abstraction to hide model details and apply model refinement if potential counterexamples are detected; after each refinement, the approach makes use of previous reachability analysis results and adapts them for the refined model, instead of a complete restart.

However, none of the available tools supports the dynamic adjustments of several parameters by a more elaborate strategy, which is either defined by the user or chosen from a pre-defined set. In this paper we propose such an approach, provide an implementation based on the HyPro [27] programming library, present some use cases to demonstrate its applicability and advantages, and discuss ideas for further extensions and improvements. Our main contributions are:


*Outline.* In Sect. 2 we recall some preliminaries on flowpipe-construction-based reachability analysis, before presenting our algorithm for the dynamic adjustment of parameter configurations in Sect. 3. In Sect. 4 we provide some experimental results and conclude the paper in Sect. 5.

### **2 Preliminaries**

In this work we develop a method to dynamically adjust the parameters of a verification method for *autonomous linear hybrid systems* whose continuous dynamics can be described by *ordinary differential equations* (*ODEs*) of the form ˙x(t) = A·x(t), but our approach can be naturally extended to methods for non-autonomous hybrid systems with external input or non-linear dynamics.

*Hybrid automata* [3] are one of the modeling formalisms for hybrid systems. Similarly to discrete transition systems, nodes (called *locations* or *control modi*) model the discrete part of the state space (e.g. the states of a discrete controller) and transitions between the nodes (called *jumps*) labeled with guards and reset functions model discrete state changes. To model the continuous dynamics between discrete state changes, *flows* in the form of ordinary differential equation (ODE) systems, and *invariants* in the form of predicates over the model variables are attached to the locations. The ODEs specify the evolution of the continuous quantities over time (called the *flowpipe*), where the control is forced to leave the current location before its invariant gets violated. Initial predicates attached to the locations specify the initial states.

A *state* σ = (,ν) of a hybrid automaton consists of a location l and a variable valuation ν. A *region* is a set of states (,P) = {} × P. A *path* π of a hybrid automaton is a sequence π = σ<sup>0</sup> → <sup>t</sup><sup>0</sup> <sup>σ</sup><sup>1</sup> <sup>→</sup> <sup>e</sup><sup>1</sup> <sup>σ</sup><sup>2</sup> <sup>→</sup> <sup>t</sup><sup>2</sup> ... of time steps <sup>σ</sup><sup>i</sup> <sup>→</sup> <sup>t</sup>*<sup>i</sup>* <sup>σ</sup><sup>i</sup>+1 of duration t<sup>i</sup> and discrete steps σ<sup>k</sup> → <sup>e</sup>*<sup>k</sup>* <sup>σ</sup><sup>k</sup>+1 following a jump, where <sup>σ</sup><sup>0</sup> = (0, ν0) is an initial state. A state is called *reachable* if there exists a path leading to it.

*Flowpipe-construction-based reachability analysis* aims at determining the states that are reachable in (a model of) a hybrid system, in order to show that certain unsafe states cannot be reached. Since the reachability problem for hybrid systems is in general undecidable, these methods usually *over-approximate* the set of states that are reachable along paths with a bounded number of jumps (called the *jump depth*) J and a bounded time duration T (called the *time horizon*) between two jumps. We explain the basic ideas needed to understand our contributions; for further reading we refer to, e.g., [8,23].

Starting from an initial region (0, V0), the analysis over-approximates flowpipes and jump successors iteratively. Due to non-determinism, this generates a *tree*, whose nodes n<sup>i</sup> are either *unprocessed* leafs storing a tuple (πi; i, Vi; ⊥), or *processed* inner nodes storing (πi; i, Vi; Vi,0,...,Vi,k*<sup>i</sup>* ).

The pair (i, Vi) is the node's *initial region*, which is (0, V0) for the *root*. By π<sup>i</sup> = Ii,0, ei,0,...,Ii,d*<sup>i</sup>* , ei,d*<sup>i</sup>* , with Ii,l being intervals and ei,l being jumps, we encode a set {σ<sup>0</sup> → <sup>t</sup><sup>0</sup> <sup>σ</sup> 0 e*i,*<sup>0</sup> <sup>→</sup> <sup>σ</sup><sup>1</sup> ... <sup>e</sup> → *i,di* <sup>σ</sup><sup>d</sup>*i*+1 <sup>|</sup> <sup>σ</sup><sup>0</sup> <sup>∈</sup> (0, V0), t<sup>l</sup> <sup>∈</sup> <sup>I</sup>i,l} of paths along which (i, Vi) is reachable.

To process a node (πi; i, Vi; ⊥), we divide the time horizon [0, T] into segments [ti,0, ti,1], . . ., [ti,k*<sup>i</sup>* , ti,k*i*+1 ] with ti,<sup>0</sup> = 0 and ti,k*i*+1 = T, and for each segment [ti,j , ti,j+1] we compute an over-approximation Vi,j of the states reachable from <sup>V</sup><sup>i</sup> in <sup>i</sup> within time [ti,j , ti,j+1]. I.e., <sup>R</sup><sup>i</sup> <sup>=</sup> <sup>∪</sup><sup>k</sup>*<sup>i</sup>* <sup>j</sup>=0Vi,j contains all valuations reachable in location <sup>i</sup> from V<sup>i</sup> within time T. The segmentation is usually homogeneous, meaning that the *time-step size* ti,j+1 − ti,j is constant, but there are also approaches for dynamic adaptations.

The processing is completed by computing for each *flowpipe segment* Vi,j and each jump e from <sup>i</sup> to some <sup>i</sup> an over-approximation V <sup>e</sup> i,j of the valuations reachable from Vi,j by executing e. To store the jump successors, either we add a child node (πi, [ti,j , ti,j+1], e; <sup>i</sup>, V <sup>e</sup> i,j ; <sup>⊥</sup>) to <sup>n</sup><sup>i</sup> for each <sup>V</sup> <sup>e</sup> i,j = ∅, or we *aggregate* successors along a jump e into a single child node (πi, [ti,j , ti,j- ], e; <sup>i</sup>, R<sup>e</sup> <sup>i</sup> ; ⊥) with V <sup>e</sup> i,l = ∅ for all l /∈ [j, j − 1] and ∪<sup>e</sup> ∪j--∈[j,j-<sup>−</sup>1] <sup>V</sup> <sup>e</sup> i,j-- <sup>⊆</sup> <sup>R</sup><sup>e</sup> <sup>i</sup> , or we *cluster* successors along a jump into a fixed number of child nodes (see Fig. 3).

For illustration purposes, above we stored all flowpipe segments Vi,j in the nodes. In practice they are too numerous and if they contain no unsafe states then they are deleted. In the following, we assume that each node stores a tuple (πi; i, Vi; p), where the flag p is 1 for processed nodes and 0 otherwise. (For a simple reachability analysis, we need to store neither the path nor the processed flag, but we will make use of the information stored in them later on. Furthermore, we could even delete the initial regions of processed nodes, however, besides counterexample and further output generation, they might be also useful for fixedpoint detection.)

*State set representations* are one of the core components in the above analysis procedure. Additionally to the storage of state sets, these datatypes need to provide certain (over-approximative) operations (union, intersection, linear transformation, Minkowski sum etc.) on states sets. Besides geometric representations (e.g., boxes/hyperrectangles, oriented rectangular hulls, convex polyhedra, template polyhedra, orthogonal polyhedra, zonotopes, ellipsoids) also symbolic representations (e.g., support functions or Taylor models) can be used for this purpose. The variety of representations is rooted in the general problem of deciding between computational effort and precision. Generally, faster computations often come at the cost of precision loss and vice

**Fig. 1.** Polytope (green) and box (hatched) approx. of state set V0. (Color figure online)

versa, more precise computations need higher computational effort. The representations might differ in their size, i.e., the required memory consumption, which has a further influence on the computational costs for operations on these representations.

## **3 CEGAR-Based Reachability Analysis**

If potential reachability of an unsafe state is detected by over-approximative computations, in order to achieve a conclusive verification result, we need to *reduce the over-approximation error* to an extent that allows to determine that the counterexample is spurious.

**Fig. 2.** Reduction and time-step size influence the flowpipe over-approximation error. (Color figure online)

*Search parameters, parameter configurations and search strategies.* The size of the over-approximation error depends on various search parameters, which influence besides the precision also the computational effort of the performed analysis:


**Fig. 3.** Six sets (gray), a guard (light green), the aggregation of their intersections (left, thick line), and the clustering of their intersections into two sets (right, thick lines); both aggregation and clustering introduces additional error (dark green and dark blue). (Color figure online)

switching off both aggregation and clustering often leads to practically intractable computational costs. Increasing the precision by allowing a larger number of clusters can improve the precision by managable increase in the running times, but the number of clusters should be carefully chosen considering also the size of the time steps (as they determine the number of flowpipe segments and thus the number of state sets to be clustered).

5. *Splitting initial sets*: Large initial state sets might be challenging for the reachability analysis. If the algorithm cannot find a conclusive answer, we can split the initial set into several subsets and apply reachability analysis to each of the subsets. Besides the enabling/disabling of initial state set splitting, also the splitting heuristics is relevant for the precision. In general, a fewer number of initial state sets is less precise but more cheap to compute with. Furthermore, it might be also relevant where the splitting takes place.

Most flowpipe-construction-based tools allow the user to define a *search parameter configuration*, fixing values for the above-listed search parameters. Aside from a few exceptions mentioned in the introduction, this configuration remains constant during the whole analysis. Whenever an unsafe state is detected to be potentially reachable, the user can re-start the analysis with a different parameter configuration to reduce the over-approximation error.

As the executions with different parameter configurations are completely independent, potentially useful information from previous search processes gets lost. To enable the exploitation of such information, we propose an approach to build a connection between executions with different parameter configurations.

Instead of a single configuration, we propose to define an ordered sequence c0,...,c<sup>n</sup> of search parameter configurations, which we call a *search strategy*, whereas the position of a parameter configuration within a search strategy is called its *refinement level*. Configurations at higher refinement levels should typically lead to more precise computations, but this is not a soundness requirement.

*Dynamic configuration adaptation.* We start the analysis with the first configuration in the search strategy, i.e. the one at refinement level 0. If the analysis with this configuration can prove safety then the process is completed.

Otherwise, if the reachability computation detects a (potentially spurious) counterexample then the search with the current configuration is paused; note that at this point there might be unprocessed nodes whose successors were not yet computed. Now, our goal is to exclude the detected counterexample by doing as few computations as possible using configurations at higher refinement levels and, if we succeed, process those yet unprocessed nodes further at refinement level 0. For the first counterexample this means intuitively re-computing reachability only along the counterexample path with the configuration at refinement level 1; we say that we *refine the path*. Note that the result of a path refinement can be a tree, e.g. if the refinement switched off aggregation. If the counterexample could be excluded by the path refinement, then we switch back to the previous refinement level to process the remaining, yet unprocessed nodes. Otherwise, if the counterexample could not be excluded then we get another, refined counterexample; in this case we recursively try to exclude this counterexample by switching to the configuration at the second refinement level etc.

Let us first clarify what we mean by *refining a counterexample path*. We define a counterexample to be a path in the search tree. If the configuration, which created the counterexample, used aggregation then it means determining the flowpipes and the jump successors for the given sequence of locations (as stored in the nodes on the path) and jumps (as stored on the edges) with the configuration at the next-higher refinement level. However, if the previous configuration did not aggregate then we need to determine only a subset of the jump successors, namely those whose time point is covered by the counterexample.

Now let us discuss what it means to refine a path *by doing as few computations as possible*. If we find a counterexample at a refinement level i then we need a refinement for the whole path at level i + 1. However, another counterexample detected previously at level i might share a prefix with the current one; if the previous counterexample has already been refined then we need to refine only the not-yet-refined postfix of the current counterexample.

The analysis at refinement level 0 and each path refinement computation generates a search tree. To reduce the computational effort as much as possible, we have to exchange information between these search trees. For example, for a given counterexample found at refinement level i we need to know whether a prefix of it was already refined at level i+1. To allow such information exchange, we could store each search tree separately and extract information from the trees when needed by traversing them. This option requires the least management overhead during reachability computations but it has major drawbacks from the point of computational costs for tree traversal. Alternatively, we could store each search tree separately but store in addition refinement relations between their nodes, allowing to relate paths and retrieve information more easily. However, we would have high costs for setting up and storing all node relations. Instead, we decided to collect all information in a single *refinement tree*. Tree updates require a careful management of the refinement nodes and their successors, but the advantage is that information about previous searches is easier accessible.

Next we first discuss how nodes of the refinement tree are processed, how paths in the refinement tree are refined, and finally we explain our dynamical parameter refinement algorithm.

*The algorithm.* Each refinement tree node n<sup>i</sup> is a kind of "meta-node" that contains an ordered sequence (n<sup>0</sup> <sup>i</sup> ,...,nu*<sup>i</sup>* <sup>i</sup> ) with 0 ≤ u<sup>i</sup> ≤ n, where n + 1 is the size of the search strategy, and each entry <sup>n</sup><sup>j</sup> <sup>i</sup> has the form (π; ,V; p) as explained in Sect. 2.

Assume for simplicity that the model has a single initial region (0, X0), and let V0,i represent X<sup>0</sup> according to the state set representation of refinement level i. The refinement tree is initialized with a root node n<sup>0</sup> = (n<sup>0</sup> 0,...,n<sup>n</sup> <sup>0</sup> ) with ni <sup>0</sup> = (; 0, V0,i; 0).

We additionally introduce a *task list* which is initialized to contain (n0; 0; ) only. Elements (ni; j; π) in the task list store the fact that we need to compute successors for the jth element of the refinement node n<sup>i</sup> at level j. If π = then we are not refining and we need to consider all the successors for further computations, otherwise we are at a refinement level j > 0 and only the successors along the counterexample-path π need to be considered.

We remove and process elements from the task list one by one. Assume we consider the task list element (ni; j; π ) with <sup>n</sup><sup>j</sup> <sup>i</sup> = (π; ,V; p).

If p = 0 then we over-approximate the flowpipe starting from V in for the time horizon T, using the configuration at level j in the search strategy.

If the computed flowpipe segments contain no bad states and the jump depth J is not yet reached then we compute also the jump successors. Depending on the clustering/aggregation settings at level j, this yields a set of jump successor regions R1,...,R<sup>m</sup> with R<sup>k</sup> = (k, Vk) over time intervals I1,...,I<sup>m</sup> along jumps e1,...,em. If the number of children m of n<sup>i</sup> is less than m then we add m−m new children; if m > 0 then we add to the newly created children as many dummy entries (containing empty sets) as the other children have, in order to bring all children to the same refinement level. After that, we select for each k = 1,...,m a different child ˆn<sup>k</sup> of n<sup>i</sup> and append (π,Ik, ek; k, Vk; 0) to the child's entry sequence (see Fig. 4). If m > m then we add to all not selected children (to which no new entry was added) a dummy entry. Finally, we set p to 1.

If the node could be processed without discovering any bad states (or if p was already 1 and thus processing was not needed) then we update the task list as follows:


Note that if π = but j > 0 then we just succeeded to refine a spurious counterexample from level j − 1 to a safe path at level j and can continue further successor computations using a lower level configuration. This switch to a lower level happens because the children ˆn<sup>k</sup> of n<sup>i</sup> have less then j entries in their queues. Now the processing is completed and the next element from the task list can be approached.

**Fig. 4.** Tree update after node refinement with changing number of child nodes and transition timing refinement.

**Fig. 5.** Partial tree refinement to remove a spurious counterexample.

If during processing (ni; j; π ) with <sup>n</sup><sup>j</sup> <sup>i</sup> = (π; ,V; p) the computed flowpipe had a non-empty intersection with the set of unsafe states then we have found a counterexample at level j. If j = n then the highest refinement level has been reached and the algorithmus terminates without any conclusive answer. Otherwise, if j<n, we repeat the computations along the counterexample path with a higher-level configuration (see Fig. 5). This is implemented by adding (n0; j + 1; π, π ) to the task list.

The main structure of the algorithm is shown in Algorithm 1.1.

#### **3.1 Incrementality**

The efficiency of the presented approach can be further improved by implementing *incrementality*: already available book-keeping and additional information gained throughout the computation can be exploited to speed up later refinements.

For example, the presented approach already keeps track of time intervals where jumps were enabled, i.e. the time intervals during which the intersection of a state set and the guard condition was non-empty. Assume we process (n;i; π ) at level i with n<sup>i</sup> = (π; ,V ; p) being the ith entry in n. Let I be the union of all the time intervals for all flowpipe segments for which a non-empty jump successor was computed along a jump e. Later, when processing (ˆn; j; ˆπ ) at level j>i with ˆn<sup>j</sup> = (ˆπ; , Vˆ ; ˆp) being the jth entry in ˆn, if the path set encoded by ˆπ is included in the path set encoded by π then we need to compute jump successors along e only for flowpipe segments over time intervals that have a non-empty intersection with I.

```
1 analyze (){
2 while ( true ) do
3 i f ( task l i s t i s empty ) then
4 return safe
5 fi ;
6 take an element (ni; j; π-

                             ) with nj
                                     i = (π; ,V; p) from task l i s t ;
7 i f (p = 0) then
8 R := computeFlowpipeSegments ( ,V, j )
9 fi ;
10 i f (p = 0 and R contains unsafe states ) then
11 i f (j = n) then return unknown ;
12 addToTaskList ((n0; j + 1; π, π-

                                  ) )
13 else
14 i f ( jump depth not yet reached ) then
15 computeJumpSuccessorsAndUpdateTaskList (ni, j, π-

                                                        , R)
16 f i
17 f i
18 od
19 }
```
**Algorithm 1.1.** Reachability analysis algorithm with backtracking and refinement.

**Table 1.** Strategies <sup>s</sup>*i* with different refinement levels (lvl.). Strategies vary time step size (δ) and state set representation (box, sf = support function). Strategy s<sup>5</sup> changes aggregation and clustering (n = no aggregation, c:max. number of successor nodes).


Similarly, if (,V ) contains no unsafe states but (, Vˆ ) does then we know that the latter counterexample is spurious if the path set encoded by ˆπ is included in the path set encoded by π.

A similar observation holds for flowpipe segments: if a segment in the flowpipe of (, Vˆ ) is empty, what happens if the invariant is violated, then we know that the same segment of the flowpipe from (, Vˆ ) will also be empty.

### **4 Experimental Results**

In order to show the general applicability of our approach we have conducted several experiments on an implementation of the method presented in Sect. 3. We have used our implementation to verify safety of several well-known benchmarks using different strategies (see Table 1). All experiments were carried on an Intel Core i7 (4 × 4 GHz) CPU with 16 GB RAM. Results for the used strategies can be found in Table 2.

*Benchmarks.* Different benchmarks from the area of hybrid systems verification are selected: The well-known bouncing ball benchmark models the height and velocity of a falling ball bouncing off the ground. The added set of bad states constrains the height of the ball after the first of 4 bounces. This benchmark already exhibits most properties more challenging benchmarks cover while being simple enough to be a sanity check for our method.

The 5-D switching system [10] is an artificially created model with 5 locations and 5 variables which shows more complex dynamic and is well-suited to show the differences in over-approximation error between the used state set representations. We added a set of bad states in the last location where the system's trajectories converge to a certain point.

The navigation benchmark [14] models the velocity and position of a point mass moving through cells on a two-dimensional plane (we used variations of instances 9 and 11). Each cell (location) exhibits different dynamic influencing the acceleration of the mass. The goal is to show that a set of good states can potentially be reached while a set of bad states will always be avoided (see Fig. 6(b)). The initial position of the mass is chosen from a set, such that this benchmark demonstrates non-determinism for the discrete transitions which results in a more complex search tree.

The platoon benchmark [1,4] models a vehicle platoon of three cars where two controlled cars follow the first one while keeping the distance e<sup>i</sup> between each other within a certain threshold (see Fig. 6(a)). This benchmark was chosen, as it unifies a higher dimension of the state space with a more complex dynamic.

*Strategies.* During the development of our approach we tested several strategies with varying parameters (a) the state set representation, (b) the time step size and (c) aggregation settings. In general, other parameters (e.g. initial set splitting) could be also considered but our prototype currently does not yet support these. For this evaluation we selected six strategies s0,...,s<sup>5</sup> which mostly vary (a) and (b) (see Table 1). Changing aggregation settings has shown to be challenging for the tree update mechanism but the exponential blow-up of the number of tree nodes did not render this method effective in practice. Furthermore for disabled aggregation settings, the largest precision gain can be observed for boxes while for all other tested state set representations the effect can be neglected. Note that our prototype implements the general case where time step sizes are not necessarily monotonically decreasing and multiples of each other which implies refinement starting from the root node.

*Comparison.* We compare our refinement algorithm (1) with a classic approach where no refinement is performed. To achieve this, we specify only a single strategy element for our algorithm. We give results for (2) the fasted successful setting (of the respective strategy), an experienced user would choose and for (3) the setting with the highest precision level, a conservative user would select. The three entries per cell in Table 2 show the running times for our dynamical approach (gray), the fastest successful setting and the conservative approach. The numbers in brackets show the number of nodes in the search tree; for refinement strategies we give the number of nodes for each refinement level.



*Observations.* The results in Table 2 show that our method in general is competitive to classical approaches, as the running times are in the same orders of magnitude as the fastest setting when using dynamic refinement and in some cases our method is even faster. From the results we can infer manifold:


**Fig. 6.** Result plots for the platoon and the navigation benchmarks with refinement. (Color figure online)

per location, the effect of partial refinement can especially be observed for this benchmark. Whole subtrees can be cut off and are shown to be unreachable on higher refinement levels such that the number of nodes is reduced. The presented method renders most effectively for systems exhibiting nondeterminism, which is reflected in a strongly branching search tree.

– Coarse analysis allows for fast discovery of the search tree, possibly requiring more nodes to be computed. We can observe that for models with nondeterminism the number of nodes at the highest required level is lower than when using the classical approach. Together with the running times this confirms our assumption that putting effort in selective, partial refinement of single branches pays off in terms of computational effort.

In conclusion we expect a strategy where a coarse analysis precedes a finegrained setting (e.g. strategy s3) which allows to detect enabled transitions quickly and to recover fast after the removal of a spurious counterexample shows good results on average.

### **5 Conclusion**

We presented a reachability analysis algorithm with dynamic configuration adjustment, which allows to refine search configurations to obtain conclusive results, but exploits as much information as possible from previous computations in order to keep the computational effort as low as possible. We plan to continue our work in several directions:

*Incrementality.* Our current implementation re-uses information from previous refinement levels about the time intervals of jump enabledness. We will implement also the re-usage of information when an invariant is definitely true or definitely violated (when the flowpipe segment for a time interval was fully contained or fully outside the invariant set).

*Additional parameters.* The current implementation supports 3 parameters in search strategies: time-step size, state set representation, aggregation and clustering settings. We aim at extend our search strategies with the adjustment of further parameters.

*Dynamic strategy synthesis.* Using information about a counterexample, e.g. the Hausdorff distance between the set of bad states and the state set intersecting it, automatically deriving strategies for partial path refinement could be further investigated.

*Parameter synthesis.* With little modification we can use our approach also to synthesize the coarsest parameter setting which still allows to verify safety. This can be achieved by strategies, where the parameter settings decrease in precision and the analysis stops when a bad state is potentially reachable.

*Partial path refinement.* Partial refinement of counterexamples, for example restricted to a suffix, could possibly improve the effectiveness of the approach (if the refinement of the suffix renders a bad state unreachable).

*Conditional strategies.* We defined search strategies to be ordered sequences of parameter configurations, which are used one after the other for refinements. Introducing *trees* of configurations with conditional branching would allow even more powerful strategies where the characteristics of the system or runtime information (like previous refinement times, state set sizes, number of sets aggregated etc.) can be used to determine which branch to take for the next refinement.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **AMT 2.0: Qualitative and Quantitative Trace Analysis with Extended Signal Temporal Logic**

Dejan Niˇckovi´c1(B), Olivier Lebeltel<sup>2</sup>, Oded Maler<sup>2</sup>, Thomas Ferr`ere<sup>3</sup>, and Dogan Ulus<sup>2</sup>

> <sup>1</sup> Austrian Institute of Technology GmbH, Vienna, Austria dejan.nickovic@ait.ac.at <sup>2</sup> Verimag, CNRS/University of Grenoble-Alpes, Grenoble, France <sup>3</sup> IST Austria, Klosterneuburg, Austria

**Abstract.** We introduce in this paper AMT 2.0, a tool for qualitative and quantitative analysis of hybrid continuous and Boolean signals that combine numerical values and discrete events. The evaluation of the signals is based on rich temporal specifications expressed in *extended Signal Temporal Logic* (xSTL), which integrates Timed Regular Expressions (TRE) within Signal Temporal Logic (STL). The tool features qualitative monitoring (property satisfaction checking), trace diagnostics for explaining and justifying property violations and specification-driven measurement of quantitative features of the signal.

### **1 Introduction**

Cyber-physical systems, such as automotive embedded controllers, medical devices or autonomous vehicles, are often modeled and analyzed by simulation. Simulators generate traces admitting real values often interpreted as continuoustime signals. To evaluate the system under design, these traces are inspected for satisfying some correctness requirements and are often subject to quantitative analysis based on recording some values in certain segments of the signal and performing some computation (summation, minimum) on them.

Over the past decade an extensive framework has been developed whose goal was to bring automated support for this tedious and error-prone task, centered around Signal Temporal Logic (STL) [18,19]. STL extends the classical LTL in two directions: it uses predicates over real-valued variables in addition to atomic propositions, and it is defined over dense continuous time accessed symbolically with timed modalities as in Metric Temporal Logic (MTL) [17]. This framework, which was initially accompanied by a rudimentary prototype tool [20], had a lot of reported applications in domains such as automotive, robotics, analog circuits, systems biology. It can be viewed as an extension of *runtime verification* toward cyber-physical hybrid systems. Interested readers may consult the survey in [7].

In this article we present AMT 2.0, a new version of the tool. The new version is much more mature in terms of software engineering aspects such as rigorous typing of signals and properties, introducing programming language features that include *declarations* and *aliases*, improvement of the graphical editors, systematic software testing, etc. Furthermore, its functionality has been extended significantly by incorporating several new research results obtained over the last years:


With all these features we progress in easing the task of designers who seek to analyze a complex system based on simulations, providing them with an alternative to manual inspection or explicit programming of observers.

The rest of the paper is organized as follows. In Sect. 2 we present the xSTL specification language. Section 3 gives an overview of the tool and its main features. We illustrate the usage of AMT 2.0 in Sect. 4 with two examples. We present the related work in Sect. 5 and give concluding remarks in Sect. 6.

### **2 Extended Signal Temporal Logic**

Extended Signal Temporal Logix (xSTL) essentially combines STL with a variant of TRE. In this section, we provide the mathematical definitions of the specification language.

We denote by P and X finite sets of *propositional* and *data* variables, such that <sup>|</sup>P<sup>|</sup> <sup>=</sup> <sup>m</sup> and <sup>|</sup>X<sup>|</sup> <sup>=</sup> <sup>n</sup>. Data variables are defined over an arbitrary domain <sup>D</sup>, typically the reals or the integers. We use the notation <sup>w</sup> : <sup>T</sup> <sup>→</sup> <sup>D</sup><sup>n</sup> <sup>×</sup> <sup>B</sup><sup>m</sup> to represent a multi-dimensional *signal* with <sup>T</sup> = [0, d) <sup>⊆</sup> <sup>R</sup> and <sup>B</sup> <sup>=</sup> {true, false}. We denote by <sup>w</sup><sup>p</sup> the *projection* of <sup>w</sup> on its component <sup>p</sup>. We denote by <sup>θ</sup> : <sup>D</sup><sup>n</sup> <sup>→</sup> <sup>B</sup> <sup>a</sup> *predicate* that maps valuations of variables in <sup>X</sup> into {true, false}.

The syntax of an STL formula ϕ with both *future* and *past* temporal operators and interpreted over <sup>X</sup> <sup>∪</sup> <sup>P</sup> is defined by the grammar

$$\varphi := p \mid \theta(x\_1, \dots, x\_n) \mid \neg \varphi \mid \varphi\_1 \lor \varphi\_2 \mid \varphi\_1 \mathcal{U}\_I \varphi\_2 \mid \varphi\_1 \mathcal{S}\_I \varphi\_2$$

where <sup>p</sup> <sup>∈</sup> <sup>P</sup>, <sup>x</sup>1,...,x<sup>n</sup> <sup>∈</sup> <sup>X</sup> and <sup>I</sup> <sup>⊆</sup> <sup>R</sup><sup>+</sup> is an interval. We denote by <sup>U</sup> the until operator that is decorated with an unbounded interval U(0,∞). We use the *strict* semantics [2] for *until* and *since* temporal operators that allows us to define (continuous-time) *next* <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup>U<sup>ϕ</sup> and (continuous-time) previous ¯ <sup>ϕ</sup> <sup>≡</sup> <sup>ϕ</sup>Sϕ. The instantaneous *rise* and *fall* events can be derived using the rules <sup>↑</sup> <sup>ϕ</sup> ≡ ¯ <sup>¬</sup><sup>ϕ</sup> ∧ <sup>ϕ</sup> and <sup>↓</sup> <sup>ϕ</sup> ≡ ¯ <sup>ϕ</sup> ∧¬ϕ. We derive other standard operators as follows: true <sup>≡</sup> <sup>p</sup> ∨ ¬p, false ≡ ¬true, <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup> ≡ ¬(¬ϕ<sup>1</sup> ∨ ¬ϕ2), <sup>ϕ</sup><sup>1</sup> <sup>→</sup> <sup>ϕ</sup><sup>2</sup> ≡ ¬ϕ<sup>1</sup> <sup>∨</sup> <sup>ϕ</sup>2, ◇I<sup>ϕ</sup> <sup>≡</sup> true <sup>U</sup><sup>I</sup> <sup>ϕ</sup>, ◇¯ <sup>I</sup><sup>ϕ</sup> <sup>≡</sup> true <sup>S</sup><sup>I</sup> <sup>ϕ</sup>, -<sup>I</sup><sup>ϕ</sup> ≡ ¬♦I¬ϕ, and -¯ <sup>I</sup><sup>ϕ</sup> ≡ ¬ ◇¯ <sup>I</sup>¬ϕ.

The semantics of an STL formula with respect to a signal w is described via the satisfiability relation (w, t) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>, indicating that the signal <sup>w</sup> satisfies <sup>ϕ</sup> at time point t, according to the following definition.

$$\begin{array}{lll} (w,t) \vdash p & \leftrightarrow w\_p[t] = \text{true} \\ (w,t) \vdash \theta(x\_1,\ldots,x\_n) \leftrightarrow \theta(w\_{x\_1}[t],\ldots,w\_{x\_n}[t]) = \text{true} \\ (w,t) \vdash \neg\varphi & \leftrightarrow (w,t) \not\models \varphi \\ (w,t) \vdash \varphi\_1 \lor \varphi\_2 & \leftrightarrow (w,t) \equiv \varphi\_1 \text{ or } (w,t) \models \varphi\_2 \\ (w,t) \vdash \varphi\_1 \mathcal{U}\_I \varphi\_2 & \leftrightarrow \exists t' \in (t+I) \cap \mathbb{T} : (w,t') \models \varphi\_2 \text{ and} \\ & & \forall t < t'' < t' \ (w,t'') \models \varphi\_1 \\ (w,t) \vdash \varphi\_1 \mathcal{S}\_I \varphi\_2 & \leftrightarrow \exists t' \in (t-I) \cap \mathbb{T} : (w,t') \models \varphi\_2 \text{ and} \\ & & \forall t' < t'' < t \ (w,t'') \models \varphi\_1 \end{array}$$

We now define a variant of TRE according to the following grammar:

$$r := \epsilon \mid p \mid \theta(x\_1, \ldots, x\_n) \mid r\_1 \cdot r\_2 \mid r\_1 \cup r\_2 \mid r\_1 \cap r\_2 \mid r^\* \mid \langle r \rangle\_I \mid r\_1 \, ? \, r\_2 \mid r\_2 \, ! \, r\_2$$

where I is an interval of R+. The semantics of a timed regular expression r with respect to a signal <sup>w</sup> and times <sup>t</sup> <sup>≤</sup> <sup>t</sup> in [0, d] is given in terms of a *match* relation (w, t, t )|<sup>≡</sup> <sup>r</sup>, which indicates that the segment of <sup>w</sup> between <sup>t</sup> and <sup>t</sup> matches the expression. This relation is defined inductively as follows:


The last two operations associate a pre-condition (resp. post-condition) to the expression. We note that with the pre- and post-condition, we can also syntactically define rise and fall operators by using the rules <sup>↑</sup> <sup>p</sup> ≡ ¬<sup>p</sup> ? ! <sup>p</sup> and <sup>↓</sup> <sup>p</sup> <sup>≡</sup> <sup>p</sup> ? ! <sup>¬</sup>p. Extended STL specifications require regular expressions to be embedded into STL formulas. We define two operators, *begin match* (@(r)) and *end match* ((r)@) that intuitively project any signal segment (t, t ) that matches the expression r to its beginning t and its end t , respectively. Thus, xSTL simply extends STL with these two operators:

$$\varphi := p \mid \theta(x\_1, \dots, x\_n) \mid \neg \varphi \mid \varphi\_1 \lor \varphi\_2 \mid \varphi\_1 \mathcal{U}\_I \varphi\_2 \mid \varphi\_1 \mathcal{S}\_I \varphi\_2 \mid \Diamond (r) \mid (r) \Diamond$$

and with the following semantics

$$\begin{array}{l}(w,t) \vdash \@{}(r) \leftrightarrow \exists t' \ge t \; (w,t,t') \equiv r\\(w,t) \vdash (r)@ \leftrightarrow \exists t' \le t \; (w,t',t) \equiv r\end{array}$$

### **3 Tool Presentation**

The AMT 2.0 tool provides for qualitative and quantitative analysis of simulation/measurement traces. Its input consists of two major ingredients. The first is typically a formula or a collection of formulas in xSTL specifying the desired properties (and later measurements) of a continuous signal. The second is a finite representation of the continuous signal. Input signals obtained from simulators or measurement devices are given as finite sequences of time-stamped values of the form (ti, w[ti]). The tool supports two commonly-used formats: Value Change Dump (vcd) and Comma Separated Values (csv) files. To obtain continuous-time signals, values between sampling points are interpolated inside the tool to yield either piecewise-constant or piecewise-linear signals.

The tool can work either interactively via its graphical user interface (GUI) or, alternatively, in batch mode when we want to monitor against many signals or incorporate monitoring in a more sophisticated analysis procedure that may iterate over behavior-generating models and/or properties in an outer loop. Figure 1 shows the main evaluation window of the GUI which provides two main functionalities: (1) editing xSTL specifications; and (2) launching the monitoring procedure by selecting properties and signals and presenting the outcome graphically. The AMT 2.0 tool is entirely implemented in Java to facilitate its usage across different platforms and operating systems.

The tool supports three main functionalities: (1) qualitative offline monitoring of extended STL specifications; (2) localization and explanation of property violations; and (3) measurements of quantitative features of signals driven by temporal pattern expressed using TRE. In the remainder of the section we present these functionalities in more detail.

**Fig. 1.** AMT 2.0 - an overview of the graphical user interface.

#### **3.1 Specifications in AMT 2.0**

The tool facilitates specification of xSTL properties in several ways. The GUI provides an xSTL editor, depicted in Fig. 2, with syntax highlighting and line numbering. In addition, the xSTL parser implements a number of features borrowed from programming languages. This includes (1) declaration of variables and constants, (2) parameterized property templates, (3) support for Boolean, real and integer variables and (4) type checking with extensive error reporting.

#### **3.2 Qualitative Monitoring of xSTL**

In this section, we sketch the algorithm for the major functionality of the tool, qualitative monitoring of xSTL specifications. The procedure is based on two main methods that we describe in the sequel: the offline marking procedure for STL [19] and the pattern matching procedure for TRE [22].

**Fig. 2.** AMT 2.0 - xSTL editor.

The qualitative monitoring procedure for STL is an offline method that works directly on the input signals. The procedure is recursive on the structure of the specification – it propagates the truth values from input signals via subformulas up to the main formula. The algorithm uses the notion of a *satisfaction signal* – we assign to each sub-formula ψ of ϕ a Boolean signal w<sup>ψ</sup> such that <sup>w</sup>ψ[t] = true iff (w, t) <sup>|</sup><sup>=</sup> <sup>ψ</sup>. For each STL operator, we define a method that computes its satisfaction signal from the satisfaction signals of its arguments. For some operators, this computation is trivial. For example, satisfaction signal <sup>w</sup>¬<sup>ϕ</sup> is obtained by flipping the truth values of the satisfaction signal <sup>w</sup>ϕ. The computation of satisfaction signals for temporal operators is more involved. We give an intuition on the computation of w<sup>ψ</sup> where ψ = ◇Iϕ and refer the reader to [19] for the technical description of the complete procedure. The computation is based on the following observation: whenever ϕ holds throughout an interval <sup>J</sup>, <sup>ψ</sup> holds throughout (<sup>J</sup> <sup>I</sup>) <sup>∩</sup> <sup>T</sup>, where <sup>J</sup> <sup>I</sup> <sup>=</sup> {<sup>t</sup> <sup>−</sup> <sup>t</sup> <sup>|</sup> <sup>t</sup> <sup>∈</sup> <sup>J</sup> and <sup>t</sup> <sup>∈</sup> <sup>I</sup>} is the Minkowski difference. Hence, the essence of the procedure is to backshift (Minkowski difference restricted to T) all the positive intervals in w<sup>ϕ</sup> and thus obtain the set of time points where ◇Iϕ holds. This method is illustrated in Fig. 3.

**Fig. 3.** Example of satisfaction signal computation for ◇[1*,*2]<sup>p</sup> using back-shifting.

The integration of TRE into the monitoring procedure of xSTL is done in two steps. First, we define the *match-set* <sup>M</sup>(r, w) of a TRE over a signal <sup>w</sup> as the set of all segments of <sup>w</sup> that match <sup>r</sup>, i.e. <sup>M</sup>(r, w) = {(t, t ) <sup>|</sup> (w, t, t )| <sup>≡</sup> <sup>r</sup>}, and use the algorithm of [22] to compute the match-set. We then use the match begin (@(r)) and match end ((r)@) operators to project the match-sets to satisfaction signals that are then directly integrated into the STL monitoring procedure described above.

The algorithm proposed in [22] computes the set of segments of a signal w that match a TRE ϕ. Since we are dealing with continuous-time signals, the number of segments is non-countable and so is potentially the number of matches. The algorithm is based on the observation that all those segments can be can be embedded in two-dimensional space, inside the triangle 0 <sup>≤</sup> <sup>t</sup> <sup>≤</sup> <sup>t</sup> ≤ |w|, where a point (t, t ) represents the segment starting at t and ending in t . The matching algorithm uses a symbolic representation of the matches as a finite union of twodimensional *zones*. Zones are special class of convex polytopes which are defined as the conjunction of inequalities of the form <sup>x</sup><sup>i</sup> <sup>≺</sup> <sup>b</sup><sup>i</sup> and <sup>x</sup><sup>i</sup> <sup>−</sup> <sup>x</sup><sup>j</sup> <sup>≺</sup> <sup>c</sup>i,j , where ≺∈{<, ≤}. For instance, the match set <sup>M</sup>(, w) for the empty word is the diagonal zone {(t, t ) <sup>∈</sup> <sup>T</sup> <sup>×</sup> <sup>T</sup> <sup>|</sup> <sup>t</sup> <sup>=</sup> <sup>t</sup> }, while the match for a literal <sup>p</sup> or <sup>¬</sup><sup>p</sup> is a disjoint union of triangles touching the diagonal whose number depends on the number of switching points in wp. The match set of the time restriction operator is obtained by intersecting the match set with the corresponding diagonal band, hence <sup>M</sup>(ϕ<sup>I</sup> , w) = <sup>M</sup>(ϕ)∩{(t, t ) <sup>|</sup> <sup>t</sup> −<sup>t</sup> <sup>∈</sup> <sup>I</sup>}. The match sets for <sup>p</sup> and p[1,2]

**Fig. 4.** Example of a match set - (a) <sup>p</sup>; and (b) <sup>p</sup>[1*,*2].

are depicted in Fig. 4. We point the reader to [22] for a complete description of the procedure. The satisfaction signals w@(r) and w(r)@ for the match-begin and match-end operators are computed from the match set of r by projecting every (t, t ) ∈ M(r) on <sup>t</sup> and <sup>t</sup> , respectively.

#### **3.3 Trace Diagnostics for STL**

The trace diagnostics procedure implements the algorithm presented in [13]. Given an STL formula ϕ and a trace w that violates ϕ, the procedure gives an explanation of the fault in the form of a *temporal implicant*, which is a small subsignal w of w which is sufficient to imply violation. In other words, any possible completion of w into a full signal will violate the property. The diagnostics procedure uses the satisfaction signals computed by the monitoring algorithm from Sect. 3.2 to explain the faults. The method uses the *satisfaction explanation* operator E (and its dual *violation explanation* operator F) that for a given formula <sup>ϕ</sup> returns an implicant of <sup>ϕ</sup> (respectively of <sup>¬</sup>ϕ) which is satisfied by <sup>w</sup>. The explanation operators are defined inductively on the structure of the formula ϕ and on the times t at which explanation of its sub-formulas are required.

We illustrate the idea behind the procedure with the following example. Consider the STL specification ϕ = ◇[0,1]p, a signal w in which p does not hold during [0, 3) and then holds during [3, 5). It is clear, for instance, that (w, 0) <sup>|</sup><sup>=</sup> <sup>ϕ</sup> and (w, 3) <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. The violation of <sup>ϕ</sup> by <sup>w</sup> at time 0 can be explained by the fact that w is continuously false throughout the interval [0, 1]. In other words, we have that F(ϕ, w, 0) = - <sup>t</sup>∈[0,1](wp[t] = false). In contrast, the value of <sup>w</sup> at *any* time <sup>t</sup> <sup>∈</sup> [3, 4] is sufficient to explain the satisfaction of <sup>ϕ</sup> by <sup>w</sup> at time 3. Thus <sup>E</sup>(ϕ, w, 3) could be any (wp[t] = true) such that <sup>t</sup> <sup>∈</sup> [3, 4]. We use the notion of a *selection function* to choose one explanation when there are many possible ones. The full algorithm is described in [13].

#### **3.4 Specification-Driven Measurements**

In this section, we present a simple declarative measurement specification language [14] built on top of TRE. The idea is to require the signal segments over which measurements should be taken to be those that match some pattern specified by an expression. An example of a measurement is the time elapsed between the beginning and end of some activity, or the total fuel consumption in a segment where the acceleration pedal is continuously on until the velocity crosses some threshold.

We first recall that the match set of a TRE defines all the trace segments that match the expression, and the number of those can be uncountably infinite. However if we restrict ourselves to patterns that are delimited by instantaneous discrete events, we will have only finitely many matches. Formally, we use the following sub-class of expressions. An *event-bounded* TRE (e-tre) is an expression of the form

<sup>r</sup><sup>ˆ</sup> := <sup>↑</sup> <sup>p</sup> | ↓ <sup>p</sup> <sup>|</sup> <sup>r</sup>ˆ<sup>1</sup> · <sup>r</sup> · <sup>r</sup>ˆ<sup>2</sup> <sup>|</sup> <sup>r</sup>ˆ<sup>1</sup> <sup>∪</sup> <sup>r</sup>ˆ<sup>2</sup> <sup>|</sup> <sup>r</sup>ˆ<sup>1</sup> <sup>∩</sup> <sup>r</sup>

with p a proposition, and ˆr1, rˆ<sup>2</sup> event-bounded TREs.

The *measure patterns* defining the segments to be measured are of the form α ? ψ ! β, where ψ is the *main* pattern, and α and β are, respectively pre- and post-conditions. The main pattern ψ specifies the portion of the signal over which the measure is taken. To guarantee a finite number of matching segments, ψ is restricted to be an e-tre while α and β, which can be used to define additional constraints, are TREs.

Given a measure pattern ϕ and a signal w, we first compute all the segments of w that match ϕ. We then apply a measuring operator that collects specific signal values over the matched segments. A measure is written with the syntax op(ϕ) with op ∈ {time, valuex, duration, infx,supx, integralx, averagex}. We finally aggregate the specific measures and provide to the user the minimum, maximum and average measured value, as well as a histogram that summarizes the measurements.

We illustrate specification-driven measurement with an example from the DSI3 automotive communication protocol [16]. The micro-controller and the sensors that use the protocol, communicate by sending *analog pulses* during the protocol initialization phase. The standard describes the acceptable shapes and duration of such pulses. Figure 5 depicts the specification of a *discovery response pulse* from the DSI3 standard. In particular, the standard defines the relevant thresholds (2IResp and IResp) which are used to describe the shape, as well as the acceptable duration of the pulse's ramp (t1) and its total duration (t2).

To define the pulse pattern we first define the following predicates:

$$i\_h \equiv i \ge 2I \\
Resp \quad i\_b \equiv I \\
Resp \le i < 2I \\
Resp \quad i\_l \equiv i < I \\
Resp$$

and then let

$$\varphi = i\_l \, ? \, \uparrow (i\_b) \cdot i\_b \cdot i\_h \cdot i\_b \cdot \downarrow (i\_b) \, ! \, i\_l \, .$$

We finally apply the measure operation duration(ϕ) to extract the duration of the segments that match the pulse pattern.

**Fig. 5.** Discovery response pulse from DSI3.

### **4 Examples**

In this section, we introduce two running examples that we use to illustrate the features and the functionalities of AMT 2.0. The first example is concerned with a mixed-signal bounded stabilization property and is used to illustrate the qualitative monitoring and trace diagnostics functionalities. The second example demonstrates the measurement functionality as applied to jitter in a digital clock.

### **4.1 Mixed-Signal Bounded Stabilization**

**Informal Requirements.** This requirement states that after every rising edge of the Boolean *trigger*, the usually-stable analog signal *var* is allowed to oscillate under the following conditions:


**Simulation Traces.** We evaluate this requirement on 5 different simulation traces. Figure 6 depicts the Boolean *trigger* signal, as well as the 5 traces named *var0* to *var4*. We can already reason informally about the satisfaction of the bounded stabilization property by these traces:

**Fig. 6.** Bounded stabilization - input signals.


**Formal Specification in xSTL.** To define the property we first declare the Boolean variable *trigger*, as well as the real variables *var0* to *var4*. We also declare two constants *vh* and *vl*, representing the 5 V and 0.2 V thresholds, respectively. We note that we are evaluating the same formula over different signals. Hence, we define a generic property template *stab* for the bounded stabilization formula, which is the conjunction of conditions (1) and (2) of the informal requirements. The first conjunct says that the real-valued signal must be smaller than 5V . The second conjunct is a conditional formula that uses logical implication. It says that whenever the *trigger* signal is on its rising edge, the x signal must go below 0.2 V within 600 s and continuously remain below that threshold for at least 300 s. Then each assertion is an instantiation of the template with one of the signals *var0* to *var4*.

```
1 bool trigger;
2 real vara;
3 ...
4 real vare;
5 const real vh = 5;
6 const real vl = 0.2;
7
8 template bool stabilization ( bool tg , real x , real vhigh ,
      real vlow) {
9 bool result = ((x <= vhigh) and ( rise (tg) -> ( eventually
      [0:600] always [0:300] x <= vlow)));
10 return result;
11 }
12
13 assertion one:
14 always (stabilization( trigger , vara , vh , vl));
15 ...
16 assertion five:
17 always (stabilization( trigger , vare , vh , vl));
```
**Qualitative Monitoring of the Specification.** We illustrate the qualitative monitoring of the property applied to the traces as done using the GUI of the tool. In the evaluation configuration window, we first specify the xSTL specification, the simulation traces and an optional alias file. In addition to setting up the inputs, we also select the Float representation of the real numbers, the Linear interpolation and the Single Explanation feature of the diagnostics module.

After evaluating the specification on the traces, we can visually depict the results, as shown in Fig. 1. The nodes in the xSTL parse tree view are expandable via a double click. By expanding the assertions node of the specification, we can see that assertion *two* is satisfied, while assertions *one*, *three*, *four* and *five* are violated. We note that we can visualize the satisfaction signals for any subproperty of the specification.

**Fault Explanation.** The fault explanation is given in the form of temporal implicants which are (small) sub-segments of the input signals which are sufficient to imply the property violation. Figure 7 illustrates the visual output of the diagnostics procedure in AMT 2.0 for the bounded stabilization specification. The first two figures show the trace diagnostics report for the third assertion. We can see that the *trigger* signal does not contribute to the fault, but *var3* does at a single point in time within the interval [100, 150]. At that time, *var3* is greater than the invariant threshold 5 V which explains the property violation. The last two figures show that same report, but for the fifth assertion. In this case, the fault is explained by the fact that signal *trigger* gets high at time 100 and by the values of signal *var4* at times 350, 600 and 750. We can see that the last two times coincide with the glitches, thus witnessing that *var4* never continuously holds below 0.2 V for at least 300 time units.

We note that the tool computes the fault explanations in a hierarchical manner, following the parse tree of the formula. This additional and complementary information can be quite useful in understanding the fault. We finally note that the trace diagnostics can be made hierarchic.

#### **4.2 Digital Clock Jitter**

**Informal Requirements.** Given a continuous-time Boolean-valued signal *clock*, a clock period is defined as a segment that starts with the rising edge of the *clock* and ends with its consecutive rising edge. The measurement specification is to measure the duration of all the clock periods matched within the *clock* signal in order to assess the clock jitter.

**Simulation Trace.** We apply the specifications to a Boolean *clock* signal, see Fig. 8.

**Formal Specification in xSTL.** We now formalize the measurement specification for the digital clock jitter analysis in xSTL. We first declare the Boolean variable *clock*, as well as its negation *nclock*. We then specify the pattern

**Fig. 7.** Bounded stabilization - fault explanation.

**Fig. 8.** Digital clock jitter - a segment of the input signal.

*clock period* that consists of concatenations that starts with the rising edge of *clock* (*startclock*), followed by an interval of positive duration where *clock* holds, followed by another interval of positive duration where *nclock* holds, and ending with the next rising edge of *clock*. Finally, we declare the actual measurement to be taken as *duration(clock period)* which extracts the durations of all signal segments that match the *clock pattern* pattern.

```
1 bool clock;
2 bool nclock = not clock;
3
4
5 measurement jitter_clock_period {
6 pattern clock_period = start (clock):clock:nclock: start (
      clock);
7 measure duration (clock_period);
8 }
9
10 measurement jitter_clock_period_c {
11 pattern clock_period_c = start (clock):{clock:nclock
      }[19000:21000]: start (clock);
12 measure duration (clock_period_c);
13 }
```
**Pattern-Driven Measurements.** The visualization of the measurement specification consists of a histogram depicting the distribution of the measures taken over signal segments that match the pattern, the total number of matched segments, as well as the minimum, maximum and average value of the measures. The visual summary of the clock jitter measurement is shown in Fig. 9.

**Fig. 9.** Digital clock jitter - measurements.

### **5 Related Work**

Breach [11] is a MATLAB/Simulink toolbox that enables various types of STL specification analysis. In particular, Breach supports falsification-based testing, parameter synthesis and requirement mining of STL properties. S-TaLiRo [3] is another Simulink/MATLAB toolbox for different robustness analysis of MTL specifications. It provides support for falsification-based testing, parameter mining, runtime verification, conformance testing, computing the worst expected robustness for stochastic systems and debugging of formal requirements. The ViSpec [15] tool, associated with S-TaLiRo, allows visual specification of MTL requirements. BIOCHAM [10] is a tool for inferring unknown (biological) model parameters from temporal logic constraints. The authors in [9] extend STL with freeze quantifiers that allow them to express oscillatory properties. Similar oscillatory properties of the heart behavior are studied using quantitative regular expressions (QRE) in [1].

Montre [21] is a prototype tool for TRE pattern matching. It provides support for both offline and online matching. AMT 2.0 implements the offline matching algorithms used by Montre and adds a specification measurement language on top of it. Montre does not provide support for STL, monitoring and trace diagnostics.

The combination of STL and TRE was inspired by the Property Specification Language (PSL) [12] and SystemVerilog Assertions (SVA) [23] standards used in the digital hardware verification. Both PSL and SVA use the *suffix implication* operator to combine temporal logic with regular expressions. In contrast, we define *match begin* and *end* operators that give us more freedom to decide whether the begin or the end of an expression match is relevant for the property. The only other work that combines temporal logic and the regular expressions in the context of continuous-time applications is presented in [8], where the authors propose the *metric dynamic logic* as the specification language for reasoning about time-event sequences.

### **6 Conclusion**

We introduced in this paper the AMT 2.0 tool for qualitative and quantitative analysis of traces coming from cyber-physical systems applications. The tool uses an expressive specification language based on a combination of STL and TRE and admits qualitative monitoring, trace diagnostics and property-driven measurements as its main functionalities. The development of the tool is a continuous work in progress and there is a number of features which are planned to be developed in the near future, in particular solving the inverse problem of finding parameters in a formula template the lead to satisfaction by a given signal or a set of signals [6].

**Acknowledgments.** This work was partially supported by project ANR-13-CESA-0008 CADMIDIA and the Productive 4.0 project (ECSEL 737459). The ECSEL Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation programme and Austria, Denmark, Germany, Finland, Czech Republic, Italy, Spain, Portugal, Poland, Ireland, Belgium, France, Netherlands, United Kingdom, Slovakia, Norway.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Multi-cost Bounded Reachability in MDP**

Arnd Hartmanns<sup>1</sup> , Sebastian Junges<sup>2</sup> , Joost-Pieter Katoen1,2 , and Tim Quatmann2(B)

> <sup>1</sup> University of Twente, Enschede, The Netherlands <sup>2</sup> RWTH Aachen University, Aachen, Germany tim.quatmann@cs.rwth-aachen.de

**Abstract.** We provide an efficient algorithm for multi-objective modelchecking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. Reachability and expected cost objectives are covered and can be mixed. Empirical evaluation shows the algorithm's scalability. We discuss the need for output beyond Pareto curves and exploit the available information from the algorithm to support decision makers.

#### **1 Introduction**

Markov decision processes [41] (MDPs) with *rewards* or *costs* are a popular model to describe planning problems under uncertainty. Planning algorithms aim to find strategies which perform well (or even optimally) for a given objective. These algorithms typically assume *that a goal is reached eventually* [41,45]. This however is unrealistic in many scenarios, e.g. due to insufficient resources or the possibility of failing actions. Furthermore, these policies often admit single runs which perform far below the user's expectation, which is unsuitable in many scenarios with high stakes. Examples range from deliveries reaching an airport after the plane's departure to more serious scenarios in e.g. wildfire management [1]. In particular, many scenarios call for minimising the probability to run out of resources before reaching the goal: while it is *beneficial* for a plane to reach its destination with low *expected* fuel consumption, it is *essential* to reach its destination with the *fixed* available amount of fuel.

Policies that optimise solely for the probability to reach a goal are mostly very expensive. Even in the presence of just a single cost structure, decision makers have to trade the success probability against the costs. This makes many planning problems inherently multi-objective [12,17]. In particular, safety properties cannot be averaged out by good performance [21]. Planning scenarios in various application areas [44] have different resource constraints. Typical examples are energy consumption and time [11], or optimal expected revenue and time [38] in robotics, and monetary cost and available capacity in logistics [17].

This work is supported by the 3TU project "Big Software on the Run", CDZ project CAP, and DFG RTG 2236 "UnRAVeL".

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 320–339, 2018. https://doi.org/10.1007/978-3-319-89963-3\_19


**Fig. 1.** Science on Mars: planning under several resource-constraints

*Illustrative Example.* Consider a simplified (discretised) version of the Mars rover task scheduling problem [11]. The task is to plan a variety of experiments for a day on Mars. The experiments vary in their success probability, time, energy consumption and their scientific value upon success. The time, energy consumption, and scientific value are uncertain and modelled by probability distributions, cf. Fig. 1(a). The objective is to achieve a minimum of daily scientific progress while limiting the risk of running out of time or out of energy. As the rover is expected to work for a longer period, we prefer a high expected scientific value.

*Contributions and approach.* This paper focuses on multi-objective cost-bounded reachability queries on MDPs, a natural setting for the aforementioned planning problems. The input is an MDP with multiple cost structures (e.g. energy, utility or time) and multiple objectives of the form "maximise/minimise the probability to reach a state in G<sup>i</sup> such that the cumulative cost for the i-th cost structure is below/above a threshold bi". This multi-objective variant of cost-bounded reachability is PSPACE-hard [43]. The focus of this paper is on the practical side: we aim at finding a practically efficient algorithm to obtain (an approximation of) the Pareto-optimal points. To accomplish this, we adapt and generalise recent approaches for the single-objective case [27,34] towards the multi-objective setting. The basic idea of [27,34] is to *implicitly* unfold the MDP along cost epochs, and exploit the regularities of the epoch-MDPs. Prism [37] and the Modest Toolset [29] have been updated with such methods for the single-objective case and significantly outperform the explicit unfolding approach of [2,40]. This paper presents an algorithm that lifts this principle to multiple cost objectives and determines approximation errors when using value iteration. Extensions towards quantiles and expected costs are considered too. Evaluation using a prototypical implementation in Storm [20] shows promising results. In addition, we equip our algorithm with means to visualise (inspired by the recent techniques in [39]) the trade-offs between various objectives that go beyond Pareto curves; we believe that this is key to obtain better insights into multi-objective decision making. An example is given in Fig. 1(b): it depicts the probability to satisfy an objective based on the remaining energy (y-axis) and time (x-axis).

*Related work.* The analysis of single-objective (cost-bounded) reachability in MDPs is an active area of research in both AI and formal method communities, and referred to in, e.g., [18,35,48]. Various model checking approaches for single objectives exist. In [32], the topology of the unfolded MDP is exploited to speed up the value iteration. In [27], three different model checking approaches are explored and compared. A survey for heuristic approaches is given in [45]. A Q-learning based approach is described in [13]. An extension of this problem in the partially observable setting was considered in [14], and for probabilistic timed automata in [27]. The method from [4] computes optimal expected values under e.g. the *condition* that the goal is reached, and is thus applicable in settings where a goal is not *necessarily* reached. A similar problem is considered in [46]. For multi-objective analysis, the model checking community typically focuses on probabilities and expected costs as in the seminal works [15,22]. Implementations are typically based on a value iteration approach in [24], and have been extended to stochastic games [16], Markov automata [42], and interval MDPs [28]. Other considered cases include e.g. multi-objective mean-payoff objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. Multi-objective problems for MDPs with an unknown cost-function are considered in [33]. Surveys on multi-objective decision making in AI and machine learning can be found in [44] and [47], respectively.

### **2 Preliminaries**

We write 2<sup>S</sup> for the powerset of <sup>S</sup>. The <sup>i</sup>-th component of a tuple *<sup>t</sup>* <sup>=</sup> v1,...,vn is *t*[i] def = vi. A (discrete) *probability distribution* over a set Ω is a function μ ∈ <sup>Ω</sup> <sup>→</sup> [0, 1] such that support(μ) def - = { ω ∈ Ω | μ(ω) > 0 } is countable and <sup>ω</sup>∈support(μ) <sup>μ</sup>(ω) = 1. Dist(Ω) is the set of all probability distributions over <sup>Ω</sup>. D(s) is the *Dirac distribution* for s, defined by D(s)(s) = 1.

**Definition 1.** *A* Markov decision process *(MDP) with* m *cost structures is a triple* M = -S, T, s*init where* <sup>S</sup> *is a finite set of states,* <sup>T</sup> <sup>∈</sup> <sup>S</sup> <sup>→</sup> <sup>2</sup>Dist(Nm×S) *is the transition function, and* s*init* ∈ S *is the initial state. For all* s ∈ S*, we require that* T(s) *is finite and non-empty.*

We write <sup>s</sup> −→<sup>T</sup> <sup>μ</sup> for <sup>∃</sup> <sup>μ</sup> <sup>∈</sup> <sup>T</sup>(s) and call it a *transition*. We write <sup>s</sup> *<sup>c</sup>* −→<sup>T</sup> s if additionally *c*, s ∈ support(μ). *c*, s is a *branch* with cost vector *c*. If T is clear from the context, we just write −→. Graphically, transitions are lines to a node from which branches labelled with their probability and costs lead to successor states. We may omit the node and probability for transitions into Dirac distributions.

*Example 1.* Figure 2 shows an MDP M*ex* . From the initial state s0, the choice of going towards s<sup>1</sup> or s<sup>2</sup> is nondeterministic. Either way, the probability to return to s<sup>0</sup> is 0.5, otherwise we move to s<sup>1</sup> (or s2). M*ex* has two cost structures: Failing to move to s<sup>1</sup> has a cost of 1 for the first, and 2 for the second structure. Moving to s<sup>2</sup> yields cost 2 for the first and no cost for the second structure.

In the remainder of this paper, we fix a given MDP M = -S, T, s*init*. Its semantics is captured by the notion of paths. A *path* in M represents the infinite concrete resolution of both nondeterministic and probabilistic choices: π = s<sup>0</sup> μ<sup>0</sup> *c*<sup>0</sup> s<sup>1</sup> μ<sup>1</sup> *c*<sup>1</sup> ... where s<sup>i</sup> ∈ S, s<sup>i</sup> −→ μi, and *c*i, si+1 ∈ support(μi) for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>. A *finite path* <sup>π</sup>fin <sup>=</sup> <sup>s</sup><sup>0</sup> <sup>μ</sup><sup>0</sup> *<sup>c</sup>*<sup>0</sup> <sup>s</sup><sup>1</sup> <sup>μ</sup><sup>1</sup> *<sup>c</sup>*<sup>1</sup> <sup>s</sup><sup>2</sup> ...μn−<sup>1</sup> *<sup>c</sup>*n−<sup>1</sup> <sup>s</sup><sup>n</sup> is a finite prefix of a path with last(πfin) def <sup>=</sup> <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup>. Let costi(πfin) def = n−1 <sup>j</sup>=0 *c*<sup>j</sup> [i]. Pathsfin(M) (Paths(M)) are the set of all (in)finite finite paths starting in s*init*. A scheduler (*adversary*, *policy* or *strategy*) resolves nondeterministic choices:

**Definition 2.** <sup>S</sup> <sup>∈</sup> Pathsfin(M) <sup>→</sup> Dist(Dist(N<sup>m</sup> <sup>×</sup> <sup>S</sup>)) *is a* scheduler *for* <sup>M</sup> *if* ∀ πfin: μ ∈ support(S(πfin)) ⇒ last(πfin) −→<sup>T</sup> μ*. The set of all schedulers of* M *is* Sched(M)*.* S *is* deterministic *if* |support(S(π))| = 1 *for all finite paths* π*.*

Via the standard cylinder set construction [25], a scheduler S induces a probability measure <sup>P</sup><sup>S</sup> <sup>M</sup> on measurable sets of paths starting from s*init*. We define the *extremal* values <sup>P</sup>max <sup>M</sup> (Π) = sup<sup>S</sup>∈Sched(M) <sup>P</sup><sup>S</sup> <sup>M</sup>(Π) and <sup>P</sup>min <sup>M</sup> (Π) = inf<sup>S</sup>∈Sched(M) <sup>P</sup><sup>S</sup> <sup>M</sup>(Π) for measurable Π ⊆ Paths(M). For clarity, we focus on probabilities in this paper, but note that expected accumulated costs can be defined analogously [25] and our methods apply to them with only minor changes.

**Cost-Bounded Reachability.** We are interested in the probabilities of sets of paths that reach certain goal states within multiple cost bounds:

**Definition 3.** *A* cost bound *is given by* -C<sup>j</sup> ∼<sup>b</sup> G *where* j ∈ {1,...,m} *identifies a cost structure,* ∼∈{<, <sup>≤</sup>, >, ≥}*,* <sup>b</sup> <sup>∈</sup> <sup>N</sup> *is a bound value, and* <sup>G</sup> <sup>⊆</sup> <sup>S</sup> *is a set of goal states. A* cost-bounded reachability formula *is a conjunction* <sup>n</sup>∈<sup>N</sup> <sup>i</sup>=1 (-C<sup>j</sup><sup>i</sup> <sup>∼</sup>ib<sup>i</sup> Gi) *of cost bounds. It characterises the measurable set of paths* <sup>Π</sup> *where, for every* <sup>i</sup>*, every* <sup>π</sup> <sup>∈</sup> <sup>Π</sup> *has a prefix* <sup>π</sup><sup>i</sup> fin *with* last(π<sup>i</sup> fin) ∈ G<sup>i</sup> *and* cost<sup>j</sup><sup>i</sup> (π<sup>i</sup> fin) ∼<sup>i</sup> bi*.*

A (single-objective) multi-cost bounded reachability query asks for <sup>P</sup>*opt* <sup>M</sup> (e) where *opt* ∈ { max, min } and e is a cost-bounded reachability formula. Unbounded and step-bounded reachability are special cases of cost-bounded reachability. A single-objective query may contain multiple bounds, but asks for a *single* scheduler that optimises the probability of satisfying them all.

We also consider multi-objective *tradeoffs*, i.e. sets of single-objective queries written as Φ = *multi* <sup>P</sup>*opt1* <sup>M</sup> (e1),...,P*opt*- <sup>M</sup> (e) . We call the e<sup>k</sup> *objectives*. For tradeoffs, we are interested in the *Pareto curve Pareto*(M,Φ) which consists of all achievable probability vectors *p*<sup>S</sup> = -<sup>P</sup><sup>S</sup> M(e1),...,P<sup>S</sup> <sup>M</sup>(e) for S ∈ Sched(M) that are not *dominated* by another achievable vector *p*<sup>S</sup>- . More precisely, *p*<sup>S</sup> ∈ *Pareto*(M,Φ) iff for all S ∈ Sched(M) either *p*<sup>S</sup> = *p*<sup>S</sup> or for some i ∈ {1,...,} we have (*opt<sup>i</sup>* = max ∧ *p*S[i] > *p*<sup>S</sup>- [i]) ∨ (*opt<sup>i</sup>* = min ∧ *p*S[i] < *p*<sup>S</sup>-[i]).

*Example 2.* We consider Φ = *multi* <sup>P</sup>max <sup>M</sup>*ex* (-<sup>C</sup>1≤<sup>1</sup> {s1}),Pmax <sup>M</sup>*ex* (-C2≤<sup>3</sup> {s2}) for M*ex* of Fig. 2. Let S<sup>j</sup> be the scheduler that tries to move to s<sup>1</sup> for at most j attempts and afterwards moves to s2. The induced probability vectors *p*<sup>S</sup><sup>1</sup> = -0.5, 1 and *p*<sup>S</sup><sup>2</sup> = -0.75, 0.75 both lie on the Pareto curve since no

**Fig. 2.** Example MDP <sup>M</sup>*ex* **Fig. 3.** An illustration of epochs

S ∈ Sched(M*ex* ) induces (strictly) larger probabilities *p*S. By also considering schedulers that randomise between the choices of S<sup>1</sup> and S<sup>2</sup> we obtain *Pareto*(M*ex* , Φ) = {w · *p*<sup>S</sup><sup>1</sup> + (1−w) · *p*<sup>S</sup><sup>2</sup> | w ∈ [0, 1]}.

For clarity of presentation, we restrict to tradeoffs Φ where every cost structure occurs exactly once, i.e., the number m of cost structures of M matches the number of cost bounds occurring in Φ. Furthermore, we require that none of the sets of goal states contains the initial state. Both assumptions are w.l.o.g. by copying cost structures as needed and adding a new initial state with zero-cost transition to the old initial state.

### **3 Multi-dimensional Sequential Value Iteration**

We present a practically efficient approach to compute (an approximation of) the Pareto curve for MDP M with m cost structures and tradeoff Φ. We merge the ideas of [24] to approximate a Pareto curve for an (unbounded) multi-objective tradeoff with those of [27,34] to efficiently compute (singleobjective) cost-bounded reachability probabilities. For clarity of presentation we start with the upper-bounded maximum case and assume a tradeoff of the form Φ = *multi* <sup>P</sup>max <sup>M</sup> (e1),...,Pmax <sup>M</sup> (e) with e<sup>k</sup> = <sup>n</sup>k−<sup>1</sup> <sup>i</sup>=nk−<sup>1</sup> (-Ci≤b<sup>i</sup> Gi) and 0 = n<sup>0</sup> < n<sup>1</sup> < ··· < n = m. Other variants are discussed in Sect. 3.3.

*Cost epochs and goal satisfaction.* Central to our approach is the concept of *cost epochs*. Consider the path π = (s0-2, 0s2-0, 0s0-<sup>1</sup>, <sup>2</sup>)<sup>ω</sup> through <sup>M</sup>*ex* of Fig. 2. We plot the accumulated cost in both dimensions along this path in Fig. 3(a). Starting from -0, 0, the first transition yields cost 2 for the first cost structure: we jump to coordinate -2, 0. The next transition, back to s0, has no cost, so we stay at -2, 0. Finally, the failed attempt to move to s<sup>1</sup> incurs costs -1, 2. Consequently, for an infinite path, infinitely many points in this grid may be reached. However, a tradeoff specifies bound values for the costs, e.g., for Φ*ex* = *multi* <sup>P</sup>max <sup>M</sup>*ex* (-<sup>C</sup>1≤<sup>4</sup> {s1}),Pmax <sup>M</sup>*ex* (-C2≤<sup>3</sup> {s2}) we get bound values 4 and 3. Once the bound value for a bound is reached, accumulating further costs in this dimension does not impact the satisfaction of its formula. It thus suffices to keep track, for each bound, of the *remaining* costs before reaching the bound value. This leads to a finite grid as depicted in Fig. 3(b). We refer to each of its coordinates as a cost epoch:

**Definition 4.** *An* <sup>m</sup>*-dimensional* cost epoch *is a tuple in <sup>E</sup>*<sup>m</sup> def = (<sup>N</sup> ∪ {⊥})m*. For <sup>e</sup>* <sup>∈</sup> *<sup>E</sup>*m*, <sup>c</sup>* <sup>∈</sup> <sup>N</sup>m*, the* successor epoch *is succ*(*e*, *<sup>c</sup>*)[i] def = *e*[i] − *c*[i] *if that value is non-negative and* ⊥ *otherwise.*

If the entry for a bound is ⊥, it cannot be satisfied any more: too much costs have already been incurred. To check whether an objective e<sup>k</sup> = <sup>n</sup>k−<sup>1</sup> <sup>i</sup>=nk−<sup>1</sup> (-Ci≤b<sup>i</sup> Gi) is satisfied, we memorise whether each individual bound already holds. This is also used to ensure that satisfying a bound more than once has no effect.

**Definition 5.** *<sup>A</sup>* goal satisfaction *<sup>g</sup>* <sup>∈</sup> *<sup>G</sup>*<sup>m</sup> def <sup>=</sup> {0, <sup>1</sup>}<sup>m</sup> *represents the cost structure indices* i *for which bound* -Ci≤b<sup>i</sup> G<sup>i</sup> *already holds, i.e.* G<sup>i</sup> *was reached before the bound value* bi*. For g* ∈ *G*m*, e* ∈ *E*<sup>m</sup> *and* s ∈ S*, let succ*(*g*, s, *e*) ∈ *G*<sup>m</sup> *define the update upon reaching* s*: succ*(*g*, s, *e*)[i]=1 *if* s ∈ G<sup>i</sup> ∧ *e*[i] = ⊥ *and succ*(*g*, s, *e*)[i] = *g*[i] *otherwise.*

### **3.1 The Unfolding Approach**

*Pareto*(M,Φ) can be computed by reducing Φ to a multi-objective *unbounded* reachability problem on the *unfolded* MDP. Its states are the Cartesian product of the original MDP's states, the epochs, and the goal satisfactions:

**Definition 6.** *The* unfolding *for* M *as in Definition 1 and upper-bounded maximum tradeoff* Φ *is the MDP* M*unf* = -S def = S × *E*<sup>m</sup> × *G*m, T ,s*init*, b1,...,bm, **0** *with no cost structures,* T (s, *<sup>e</sup>*, *<sup>g</sup>*) def = { *unf* (μ) ∈ Dist(N<sup>0</sup> <sup>×</sup> <sup>S</sup> ) | μ ∈ T(s) } *and the unfolding of probability distribution* μ *defined by unf* (μ)(-s , *e* , *g* ) = μ(*c*, s ) *if e* = *succ*(*e*, *c*) ∧ *g* = *succ*(*g*, s , *e* ) *and* 0 *otherwise.*

Costs are now encoded in the state space, so it suffices to consider the unbounded tradeoff Φ = *multi* <sup>P</sup>max <sup>M</sup>*unf* (e 1),...,Pmax <sup>M</sup>*unf* (e ) with e <sup>k</sup> = -·≥<sup>0</sup> G <sup>k</sup> and G <sup>k</sup> = {s, *<sup>e</sup>*, *<sup>g</sup>* | <sup>n</sup>k−<sup>1</sup> <sup>i</sup>=nk−<sup>1</sup> *<sup>g</sup>*[i]=1}.

**Lemma 1.** *There is a bijection* <sup>f</sup> : Sched(M) <sup>→</sup> Sched(M*unf* ) *with* <sup>P</sup><sup>S</sup> <sup>M</sup>(ek) = P<sup>f</sup>(S) <sup>M</sup>*unf* (e <sup>k</sup>) *for all* S ∈ Sched(M) *and* k ∈ { 1,..., }*. Consequently, we have that Pareto*(M,Φ) = *Pareto*(M*unf* , Φ )*.*

*Pareto*(M*unf* , Φ ) can be computed with existing multi-objective model checking algorithms for unbounded reachability. We build on the one of [24]. It iteratively chooses weight vectors *w* = <sup>w</sup>1,...,w ∈ [0, 1] \ {**0**} and computes points

$$\mathfrak{p}\_w = \langle \mathcal{P}\_{M\_{w\emptyset}}^{\mathfrak{S}}(e\_1'), \dots, \mathcal{P}\_{M\_{w\emptyset}}^{\mathfrak{S}}(e\_\ell') \rangle \text{ with } \mathfrak{S} \in \text{arg}\max\_{k=1} \mathfrak{P} \left( \sum\_{k=1}^{\ell} w\_k \cdot \mathcal{P}\_{M\_{w\emptyset}}^{\mathfrak{S}'}(e\_k') \right). \tag{1}$$

The Pareto curve *P* is convex, *pw* ∈ *P* for all *w*, and *q* ∈ *P* implies *q*·*w* ≤ *pw* ·*w*. These observations allow us to approximate the Pareto curve with arbitrary precision; see [24] for details. [24] characterises *pw* via weighted expected costs: M*unf* is equipped with cost structures used to calculate the probability of each of the objectives. This is achieved by setting the value of the k-th cost structure on each branch to 1 iff the objective e <sup>k</sup> is satisfied in the target state of the branch but was *not* satisfied in the transition's source state. On a path π through the resulting model M<sup>+</sup> *unf* , we collect exactly one cost w.r.t. cost structure k iff π satisfies objective ek.

**Definition 7.** *For* <sup>S</sup> <sup>∈</sup> Sched(M<sup>+</sup> *unf* ) *and <sup>w</sup>* <sup>∈</sup> [0, 1]*, the* weighted expected cost *is* <sup>E</sup><sup>S</sup> <sup>M</sup><sup>+</sup> *unf* (*w*) = - <sup>k</sup>=1 *w*[k] · <sup>π</sup>∈Paths(M) costk(π)dP<sup>S</sup> <sup>M</sup><sup>+</sup> *unf* (π)*, i.e. the expected value of the weighted sum of the costs accumulated on paths in* M<sup>+</sup> *unf .*

The following characterisation of *pw* is equivalent to Eq. 1:

$$\mathbf{p}\_w = \langle \mathcal{E}\_{M\_{\text{unf}}^+}^{\mathfrak{G}}(\mathbf{1}\_1), \dots, \mathcal{E}\_{M\_{\text{unf}}^+}^{\mathfrak{G}}(\mathbf{1}\_\ell) \rangle \quad \text{where} \quad \mathfrak{S} \in \text{arg}\, \max\_{\mathfrak{S}'} \mathcal{E}\_{M\_{\text{unf}}^+}^{\mathfrak{G}'}(\mathbf{w}) \tag{2}$$

and **<sup>1</sup>**<sup>k</sup> ∈ {0, <sup>1</sup>} is the weight vector defined by **<sup>1</sup>**k[j] = 1 iff <sup>j</sup> <sup>=</sup> <sup>k</sup>. Standard MDP model checking algorithms [41] can be applied to compute an optimal (deterministic and memoryless) scheduler <sup>S</sup> and the induced costs <sup>E</sup><sup>S</sup> M<sup>+</sup> *unf* (**1**k).

#### **3.2 An Epoch Model Approach Without Unfolding**

The unfolding approach does not scale well: If the original MDP has n states, the unfolding will have on the order of n · m <sup>i</sup>=1(b<sup>i</sup> + 2) states. This makes it infeasible for larger bound values b<sup>i</sup> over multiple bounds. The bottleneck lies in computing the points *pw* as in Eqs. 1 and 2. We now show how to do so efficiently, i.e. given a weight vector *w* = <sup>w</sup>1,...,w ∈ [0, 1] \ {**0**}, compute

$$p\_w = \langle \mathcal{P}\_M^{\mathfrak{G}}(e\_1), \dots, \mathcal{P}\_M^{\mathfrak{G}}(e\_\ell) \rangle \text{ with } \mathfrak{S} \in \text{arg}\max\_{\mathfrak{G}'} \left( \sum\_{k=1}^{\ell} w\_i \cdot \mathcal{P}\_M^{\mathfrak{G}'}(\langle \cdot \rangle\_{\geq 0} e\_k) \right) \tag{3}$$

without unfolding. The characterisations of *pw* given in Eqs. 1 and 3 are equivalent due to Lemma 1.

The efficient analysis of single-objective queries with a single bound Φ<sup>1</sup> = <sup>P</sup>max <sup>M</sup> (-C≤<sup>b</sup> G) has recently been addressed in e.g. [27,34]. The key observation is that the unfolding M*unf* can be decomposed into b + 2 *epoch model* MDPs M<sup>b</sup>,...,M<sup>0</sup>, M<sup>⊥</sup> corresponding to the cost epochs. The epoch models are copies of M with only slight adaptations. Reachability probabilities in copies corresponding to epoch <sup>i</sup> only depend on the copies { <sup>M</sup><sup>j</sup> <sup>|</sup> <sup>j</sup> <sup>≤</sup> <sup>i</sup> <sup>∨</sup> <sup>j</sup> <sup>=</sup> ⊥ }. It is thus possible to analyse M⊥,...,M<sup>b</sup> sequentially instead of considering all copies at once. In particular, it is not necessary to construct the full unfolding.

We lift this idea to multi-objective tradeoffs. The single-objective case is notably simpler in that reaching a goal state for the first time or exceeding the cost bound immediately suffices to determine whether the one property is

**Fig. 4.** An epoch model of <sup>M</sup>*ex*

satisfied. In particular, while M<sup>⊥</sup> is just one sink state in the single-objective case, its structure is more involved here.

We first formalise the notion of *epoch models* for multiple bounds. The aim is to build an MDP for each epoch *e* ∈ *E*<sup>m</sup> that can be analysed via standard model checking techniques using the weighted expected cost encoding of objective probabilities. The state space of an epoch model consists of up to one copy of each original state for each goal satisfaction vector *g* ∈ *G*m. Additional sink states s⊥, *g* encode the target for a jump to *any* other cost epoch *e* = *e*. We consider cost structures to encode the objective probabilities. Let function *satObj* <sup>Φ</sup> : *<sup>G</sup>*<sup>m</sup> <sup>×</sup>*G*<sup>m</sup> → {0, <sup>1</sup>} assign value 1 in entry <sup>k</sup> iff a reachability property e<sup>k</sup> is satisfied according to the second goal vector but was not satisfied in the first. For the transitions' branches, we distinguish two cases: (1) If the successor epoch *<sup>e</sup>* <sup>=</sup> *succ*(*e*, *<sup>c</sup>*) with respect to the *original* cost *<sup>c</sup>* <sup>∈</sup> <sup>N</sup><sup>m</sup> is the same as the current epoch *e*, we jump to the successor state as before, and update the goal satisfaction. We collect the *new* costs for the *objectives* if updating the goal satisfaction newly satisfies an objective as given by *satObj* <sup>Φ</sup> (2). If the successor epoch *e* = *succ*(*e*, *c*) is different from the current epoch *e*, the probability is rerouted to the sink state with the corresponding goal state satisfaction vector. The collected costs contains the part of the goal satisfaction as in (1), but also the results obtained by analysing the reached epoch *e* , given by a function f.

**Definition 8.** *The* epoch model *of MDP* M *as in Definition 1 for e* ∈ *E*<sup>m</sup> *and a function* <sup>f</sup> : *<sup>G</sup>*<sup>m</sup> <sup>×</sup>Dist(N<sup>m</sup> <sup>×</sup> <sup>S</sup>) <sup>→</sup> [0, 1] *is the MDP* <sup>M</sup>*<sup>e</sup>* <sup>f</sup> = -S*<sup>e</sup>* , T*<sup>e</sup>* <sup>f</sup> ,s*init*, **0** *with cost structures,* <sup>S</sup>*<sup>e</sup>* def = (<sup>S</sup> <sup>s</sup>⊥) <sup>×</sup> *<sup>G</sup>*m*,* <sup>T</sup>*<sup>e</sup>* <sup>f</sup> (s⊥, *g*) = { D(-**0**,s⊥, *g*) }*, and for every* s˜ = s, *<sup>g</sup>* ∈ <sup>S</sup>*<sup>e</sup> and* <sup>μ</sup> <sup>∈</sup> <sup>T</sup>(s)*, there is some* <sup>ν</sup> <sup>∈</sup> <sup>T</sup>*<sup>e</sup>* <sup>f</sup> (˜s) *defined by:*


Figure 4 shows an epoch model M*<sup>e</sup>* <sup>f</sup> of the MDP M*ex* in Fig. 2 with respect to tradeoff Φ as in Example 2 and any epoch *e* ∈ *E*<sup>2</sup> with *e*[1] = ⊥ and *e*[2] = ⊥.

**Input** : MDP <sup>M</sup> <sup>=</sup> -S, T, s*init* , tradeoff <sup>Φ</sup> <sup>=</sup> *multi* <sup>P</sup>max <sup>M</sup> (e1),..., <sup>P</sup>max <sup>M</sup> (e-) with bound values <sup>b</sup>1,...,bm, weight vector *<sup>w</sup>* <sup>∈</sup> [0, 1] and proper epoch sequence <sup>E</sup> ending with last(E) = b1,...,bm

**Output** : Point *pw* <sup>∈</sup> <sup>R</sup> satisfying Eq. 3 **<sup>1</sup> foreach** *<sup>e</sup>* <sup>∈</sup> <sup>E</sup> in ascending order **do <sup>2</sup> foreach** *<sup>g</sup>* <sup>∈</sup> *<sup>G</sup>*m, <sup>μ</sup> ∈ {<sup>ν</sup> | ∃s: <sup>ν</sup> <sup>∈</sup> <sup>T</sup>(s)} **do <sup>3</sup>** *<sup>z</sup>* <sup>←</sup> **0 <sup>4</sup> foreach** *c*, s ∈ support(μ) **do <sup>5</sup>** *<sup>e</sup>* <sup>←</sup> *succ*(*e*, *<sup>c</sup>*); *<sup>g</sup>* <sup>←</sup> *succ*(*g*, s , *e* ) **<sup>6</sup> if** *<sup>e</sup>* <sup>=</sup> *<sup>e</sup>* **then <sup>7</sup>** *z* ← *z* + μ(*c*, s ) · <sup>x</sup>*<sup>e</sup>*- [s , *g* ] **<sup>8</sup>** f(*g*, μ) ← *z* **<sup>9</sup>** build epoch model M*<sup>e</sup>* <sup>f</sup> = -S*<sup>e</sup>* , T*<sup>e</sup>* <sup>f</sup> , s*<sup>e</sup> init* **<sup>10</sup>** S ← arg max<sup>S</sup>- <sup>E</sup><sup>S</sup>- M*<sup>e</sup>* <sup>f</sup> (*w*) **<sup>11</sup> foreach** <sup>k</sup> ∈ {1,...,}, ˜<sup>s</sup> <sup>∈</sup> <sup>S</sup>*<sup>e</sup>* **do <sup>12</sup>** <sup>x</sup>*<sup>e</sup>* [˜s][k] ← E<sup>S</sup> M*<sup>e</sup>* <sup>f</sup> (**1**<sup>k</sup>)[˜s] **<sup>13</sup> return** <sup>x</sup>last(E) [s last(E) *init* ]

**Algorithm 1.** Sequential multi-cost bounded analysis

*Remark 1.* The structure of M*<sup>e</sup>* <sup>f</sup> differs only slightly between epochs. In particular consider epochs *e*, *e* with *e*[i] = ⊥ iff *e* [i] = ⊥. To construct epoch model M*<sup>e</sup>*- <sup>f</sup> from M*<sup>e</sup>* <sup>f</sup> , only transitions to the bottom states s⊥, *g* need to be adapted.

To analyse an epoch model M*<sup>e</sup>* <sup>f</sup> , any successor epoch *e* of *e* needs to be analysed before. Since costs are non-negative, we can ensure this by analysing the epochs in a specific order. In the single dimensional case the order is uniquely given by ⊥, 0, 1,...,b. For multiple cost bounds any linearisation of the partial order ⊆ *E*<sup>m</sup> × *E*<sup>m</sup> with *e e* iff *e* [i] ≤ *e*[i] ∨ *e* [i] = ⊥ for all i can be considered. We call such a linearisation a *proper epoch sequence*.

We compute the points *pw* by analysing the different epoch models (i.e. the coordinates of Fig. 3(b)) sequentially. The main procedure is outlined in Algorithm 1. The costs of the model for the current epoch are computed in lines 2-8. These costs comprise the results from previously analysed epochs *e* . In lines 9-12, the current epoch model M*<sup>e</sup>* <sup>f</sup> is built and analysed: We compute weighted expected costs on M*<sup>e</sup>* <sup>f</sup> where <sup>E</sup><sup>S</sup> M*<sup>e</sup>* f (*w*)[s] denotes the expected costs for M*<sup>e</sup>* <sup>f</sup> when changing the initial state to s. In line 10 a (deterministic and memoryless) scheduler S that induces the maximal weighted expected costs (i.e. <sup>E</sup><sup>S</sup> M*<sup>e</sup>* f (*w*)[s] = max<sup>S</sup>- <sup>E</sup><sup>S</sup>- M*<sup>e</sup>* f (*w*)[s] for all states s) is computed. In line 12 we then compute the expected costs induced by S for the individual objectives.

#### **Theorem 1.** *The output of Algorithm 1 satisfies Eq. 3.*

*Proof (sketch).* Let *e* be the currently analysed epoch. Since E is assumed to be a *proper* epoch sequence, we already processed any reachable successor epoch *e*

of *e*, i.e., line 7 is only executed for epochs *e* for which x*<sup>e</sup>*- has already been computed. One can show that the values <sup>x</sup>*<sup>e</sup>* [s, *g*][k] computed by the algorithm coincide with the probability to satisfy e <sup>k</sup> from state s, *e*, *g* in the unfolding M*unf* under a scheduler S that maximises the weighted sum.

*Error propagation.* So far, we assumed that (weighted) expected costs <sup>E</sup><sup>S</sup> <sup>M</sup>(*w*) are computed exactly. Practical implementations, however, are often based on numerical methods that only approximate the correct solution. In fact, methods based on value iteration—the de-facto standard in MDP model checking—do not give any guarantee on the accuracy of the obtained result [26]. We therefore consider interval iteration [5,9] which for a predefined precision ε > 0 guarantees that the obtained result <sup>x</sup><sup>s</sup> is <sup>ε</sup>-precise, i.e. we have <sup>|</sup>x<sup>s</sup> − E<sup>S</sup> <sup>M</sup>(*w*)[s]| ≤ ε.

For the single-cost bounded variant of Algorithm 1, [27] discusses that in order to compute <sup>P</sup>max <sup>M</sup> (-C≤<sup>b</sup> G) with precision ε, each epoch model needs to be analysed with precision <sup>ε</sup> <sup>b</sup>+1 . We generalise this result to multi-dimensional tradeoffs. Assume the results of previously analysed epochs (given by f) are εprecise and that M*<sup>e</sup>* <sup>f</sup> is analysed with precision δ. As in the single-dimensional case, the total error for M*<sup>e</sup>* <sup>f</sup> can accumulate to δ + ε. Since a path through the MDP M can visit at most m <sup>i</sup>=1(b<sup>i</sup> + 1) cost epochs whose analysis introduces error δ, the overall error can be upper bounded by δ · m <sup>i</sup>=1(b<sup>i</sup> + 1).

**Theorem 2.** *If the values* x*<sup>e</sup>* [˜s][k] *at line 12 of Algorithm 1 are computed with precision* ε/m <sup>i</sup>=1(bi+1) *for some* ε > 0*, the output p <sup>w</sup> of the algorithm satisfies* |*pw* − *p <sup>w</sup>* | · *w* ≤ ε *where pw is as in Eq. 3.*

*Remark 2.* Alternatively, epochs can be analysed with the desired overall precision ε by lifting the results from topological interval iteration [5]. However, that requires to store the obtained bounds for the results of already analysed epochs.

#### **3.3 Extensions**

*Minimising objectives.* Objectives <sup>P</sup>min <sup>M</sup> (ek) can be handled by extending the function *satObj* <sup>Φ</sup> in Definition 8 such that it assigns cost −1 to branches that lead to the satisfaction of ek. To obtain the desired probabilities we then maximise negative costs and multiply the result by −1 afterwards. As interval iteration supports mixtures of positive and negative costs [5], arbitrary combinations of minimising and maximising objectives can be considered<sup>1</sup>.

*Beyond upper bounds.* Our approach also supports bounds of the form -C<sup>j</sup> ∼<sup>b</sup> G for ∼∈{<, ≤, >, ≥}, i.e., we allow *combinations* of lower and upper cost-bounds. For strict upper bounds < b and non-strict lower bounds ≥ b we consider ≤ b + 1 and > b−1 instead. For bound -Ci>b<sup>i</sup> G<sup>i</sup> we adapt the update of goal satisfactions such that *succ*(*g*, s, *e*)[i] = 1 if either *g*[i] = 1 or s ∈ G<sup>i</sup> ∧ *e*[i] = ⊥. Similarly, we support multi-bounded-single-goal queries of the form -C(j1,...,jn)(∼1b1,...,∼nbn) G which characterises the paths π with a single prefix πfin satisfying last(πfin) ∈ G and *all* cost bounds, i.e., cost<sup>j</sup><sup>i</sup> (πfin) ∼<sup>i</sup> bi.

<sup>1</sup> This supersedes a restriction of the algorithm of [24].

**Fig. 5.** Pareto curves

*Example 3.* The formula e = -C(1,1)(≤1,≥1) G expresses the paths that reach G while collecting exactly one cost w.r.t. the first cost structure. This formula is not equivalent to e = -C1≤<sup>1</sup> G ∧ -C1≥<sup>1</sup> G since, e.g., for G = { s<sup>0</sup> } the path π = s0-2s<sup>0</sup> satisfies e but not e.

*Expected cost objectives.* We can consider cost-bounded expected cost objectives <sup>E</sup>*opt* <sup>M</sup> (R<sup>j</sup><sup>1</sup> ,-C<sup>j</sup><sup>2</sup> ≤<sup>b</sup>) with *opt* ∈ { max, min } which refer to the expected cost accumulated for cost structure j<sup>1</sup> within a given cost bound -C<sup>j</sup><sup>2</sup> ≤<sup>b</sup> . Similar to cost-bounded reachability queries, we compute cost-bounded expected costs via computing (weighted) expected costs within epoch models.

*Quantiles.* A (multi-dimensional) quantile has the form *Qu*(P*opt* <sup>M</sup> (e) ∼ p) for *opt* ∈ { min, max }, ∼∈{<, <sup>≤</sup>, >, ≥}, <sup>e</sup> <sup>=</sup> <sup>n</sup>∈<sup>N</sup> <sup>i</sup>=1 (-C<sup>j</sup><sup>i</sup> <sup>∼</sup>ib<sup>i</sup> Gi) and a fixed probability threshold p ∈ [0, 1]. The quantile asks for the set of bound values B that satisfy the probability threshold, i.e., B = {<sup>b</sup><sup>1</sup> ...,bn|P*opt* <sup>M</sup> (e) ∼ p}. The computation of quantiles for single-cost bounded reachability has been discussed in [3,34], where multiple cost bounds are supported via unfolding. Unfolding requires to fix bound values b2,...,b<sup>n</sup> a priori, and one can only ask for all b<sup>1</sup> that satisfy the property. Our approach provides the basis for lifting the ideas of [3,34] to multi-bounded queries. Roughly, one extends the epoch sequence E in Algorithm 1 dynamically until the epochs in which the bounded reachability probability passes the threshold p are explored. Additional steps such as detecting the case where B = ∅ are left for future work.

### **4 Visualisations**

The results of a multi-objective model checking analysis are typically presented as a single (approximation of a) Pareto curve. For more than two objectives, the performance of the Pareto-optimal scheduler can be displayed in a bar chart as in Fig. 4, where the colours reflect different objectives and the groups different schedulers. The aim is to visualise the tradeoffs between the different objectives such that the user can make an informed decision about the system design or pick a scheduler for implementation. However, Pareto set visualisations alone

**Fig. 6.** Two-dimensional plots of Pareto-optimal schedulers for different quantities (Color figure online)

may not provide sufficient information, about, e.g., which objectives are aligned or conflicting (see e.g. [39] for a discussion in the non-probabilistic case). Cost bounds furthermore add an extra dimension for each cost structure. Consider the Mars rover MDP M<sup>r</sup> and tradeoff *multi obj* <sup>100</sup>, *obj* <sup>140</sup> with

$$obj\_v = \mathcal{P}\_{M\_r}^{\max}(\langle \mathcal{C}\_{time} \rangle\_{\leq 175} \, B \land \langle \mathcal{C}\_{energy} \rangle\_{\leq 100} \, B \land \langle \mathcal{C}\_{value} \rangle\_{\geq v} \, B)$$

where *B* is the set of states where the rover has safely returned to its base. We ask for the tradeoff between performing experiments of scientific value at least 100 before returning to base within 175 time units and maximum energy consumption of 100 units (*obj* <sup>100</sup>) vs. achieving the same with scientific value at least 140 (*obj* <sup>140</sup>). The Pareto curve (Fig. 5(a)) shows the tradeoff between achieving *obj* <sup>100</sup> and *obj* <sup>140</sup>. However, for each Pareto-optimal scheduler, our method has implicitly computed the probabilities of the two objectives for all reachable epochs as well, i.e. for all bounds on the three quantities below the ones required in the tradeoff. We visualise this information for deep insights into the behaviour of each scheduler, its robustness w.r.t. the bounds, and its preferences for certain objectives depending on the remaining budget for each quantity.

We use plots as shown in Fig. 6. They can be generated in no extra runtime or memory since all required data is already computed implicitly. We restrict to twodimensional plots since they are easier to grasp than complex three-dimensional visualisations. In each plot, we can thus show the relationship between three different quantities: one on the x-axis (*x* ), one on the y-axis (*y*), and one encoded as the colour of the points (*z*, where we use blue for high values, red for low values, black for probability zero, and white for unreachable epochs). Yet our example tradeoff already contains five quantities: the probability for *obj* <sup>100</sup>, the probability for *obj* <sup>140</sup>, the available time and energy to be spent, and the remaining scientific value to be accumulated. We thus need to project out some quantities. We do this by showing at every *x*, *y* coordinate the maximum or minimum value of the *z* quantity when ranging over *all* reachable values of the hidden *costs* at this coordinate. That is, we show a best- or worst-case situation, depending on the semantics of the respective quantities.

Out of the 30 possible combinations of quantities for our example, we showcase three to illustrate the added value of the obtained information. First, in Fig. 6(a), we plot the probabilities of the two objectives vs. the minimum scientific value that still needs to be accumulated for two different Pareto-optimal schedulers (left: S1, right: S2). White areas indicate that no epoch for the particular combination of probabilities is reachable from the tradeoff's bounds. These two and all other Pareto-optimal schedulers are white above the diagonal, which means that *obj* <sup>100</sup> implies *obj* <sup>140</sup>, i.e. the objectives are aligned. For the left scheduler, we further see that all blue-ish areas are associated to lower probabilities for both objectives. Since blue indicates higher values, this scheduler achieves only low probabilities when it still needs to make the rover accumulate a high amount of value. However, it overall achieves higher probabilities for *obj* <sup>140</sup> at medium value requirements, whereas the right scheduler is "safer" and focuses on satisfying *obj* <sup>100</sup>. The erratic spikes on the left occur because some probabilities are only reached after very unlikely paths.

In Fig. 6(b), we show for S<sup>1</sup> the probability to achieve *obj* <sup>100</sup> depending on the remaining energy to be spent vs. the remaining scientific value to be accumulated. We see a white vertical line for every odd *x* -value; this is because, over all branches in the model, the gcd of all value costs is 2. The left plot shows the minimum probabilities over the hidden costs, i.e. we see the probability for the worst-case remaining time; the right plot shows the best-case scenario. Not surprisingly, when time is low, only a lot of energy makes it possible to reach the objective with non-zero probability.


**Table 1.** Runtime comparison for multi-cost single-objective queries

**Table 2.** Runtime comparison for multi-cost multi-objective queries


Finally, Fig. 6(c) shows the probability for *obj* <sup>140</sup> depending on available time and energy for S2. We plot the minimum probability over the hidden scientific value requirement, i.e. a worst-case view. The plot shows that time is of little use in case of low remaining energy, but it helps significantly when there is sufficient energy, too. In Fig. 6(d), we depict for the same scheduler the minimum remaining scientific value (*z* ) under which a certain probability for *obj* <sup>100</sup> can be achieved (*y*), given a certain remaining time budget (*x* ). The upper left corner shows that a high probability in little time is only achievable if we need to collect little more value; the value requirement gradually relaxes as we aim for lower probabilities or have more time.

#### **5 Experiments**

*Implementation.* We implemented the presented approach into Storm [20] v1.2, and available via [19]. The implementation computes extremal probabilities for single-objective multi-cost bounded queries, as well as Pareto curves for the multi-objective case. We consider the *sparse* engine of Storm, i.e., explicit data structures such as sparse matrices. For single-cost bounded properties, this has already been addressed in [34]. For the computation of expected cost (Lines 10 to 12 of Algorithm 1) we employ interval iteration with finite precision floats as well as policy iteration with infinite precision rationals. The expected costs (lines 10 to 12 of Algorithm 1) are computed either numerically (via interval iteration over finite precision floats) or exactly (via policy iteration over infinite precision rationals). To reduce the memory consumption, the analysis result of an epoch model M*<sup>e</sup>* <sup>f</sup> is erased as soon as possible.

**Fig. 7.** Runtime (y-axis) of SEQ (+) and UNF (×) for increasing cost bounds (x-axis)

*Set-up & reproduction.* We evaluate the approach on wide range of case studies, available in the artefact [30]. The models are given in Prism's [37] guarded command language. For each case study we consider single- and multi-objective queries that yield non-trivial results, i.e., probabilities strictly between zero and one. We compare the naive unfolding approach (UNF) as in Sect. 3.1 with the sequential approach (SEQ) as in Sect. 3.2. The unfolding of the model is applied on the Prism language level, by considering a parallel composition with cost counting structures. On the unfolded model we apply the algorithms for unbounded reachability as available in Storm. We considered precision η = 10−<sup>4</sup> for the Pareto curve approximation and precision ε = 10−<sup>6</sup> for interval iteration. We increased the precision for single epoch models as in Theorem 2.

We ran our experiments on a single core (2 GHz) of a HP BL685C G7 system with 192 GB of memory. We stopped each experiment after a time limit of 2 hours. For experiments that completed within the time limit, we observed a memory consumption of up to 36 GB for UNF and up to 5 GB for SEQ.

A binary equivalent to the binary we used for the experiments is available in the artefact [30]. The binary has been tested in the artefact evaluation VM [31]. For other configurations, Storm should be recompiled using the sources [19].

Details on reproduction of the tables, as well as details on how to analyse multi-cost bounded properties using Storm in general can be found in the readme, enclosed in the artefact.

*Experimental Results.* Tables 1 and 2 show results for single- and multi-objective queries, respectively. The first columns yield the number of states and transitions of the original MDP, then for the query, the number of bounds m, the number of *different* cost structures r, and the number of reachable cost epochs (reflecting the magnitude of the bound values). |S*unf* | denotes the number of reachable states in the unfolding. For multi-objective queries, we additionally give the number of objectives and the number of analysed weight vectors *w*. The remaining columns depict the runtimes of the different approaches in seconds. For UNF, we considered both the sparse (sp) and symbolic (dd) engine of Storm. The symbolic engine neither supports multi-objective model checking nor exact policy iteration.

On the majority of benchmarks, SEQ performs better than UNF. Typically, SEQ is less sensitive to increases in the magnitude of the cost bounds, as illustrated in Fig. 7. For three benchmark and query instances, we plot the runtime of both approaches against different numbers <sup>|</sup>E<sup>|</sup> of reachable epochs. While for small cost bounds, UNF is sometimes even faster compared to SEQ, SEQ scales better with increasing <sup>|</sup>E|. It is not surprising that SEQ scales better, ultimately, the increased state space and the accompanying memory consumption in UNF is a bottleneck. The most important reason that UNF performs better for some (smaller) cost bounds is the induced overhead of checking the full epoch. In particular, the epoch contains (often many) states that are not reachable from the initial state (in the unfolding).

#### **6 Conclusion**

Many real-world planning problems consider several limited resources and contain tradeoffs. This paper present a practically efficient approach to analyse these problems. It has been implemented in the Storm model checker and shows significant performance benefits. The algorithm implicitly computes a large amount of information that is hidden in the standard plots of Pareto curves shown to visualise the results of a multi-objective analysis. We have developed a new set of visualisations that exploit all the available data to provide new and clear insights to decision makers even for problems with many objectives and cost dimensions.

**Data Availability Statement.** The datasets analysed during the current study, and the binary used for the analysis, are available in the figshare repository [30]. Source code matching the binary is available in [19].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Statistical Model Checker for Nondeterminism and Rare Events**

Carlos E. Budde<sup>1</sup>, Pedro R. D'Argenio2,3,4, Arnd Hartmanns1(B) , and Sean Sedwards<sup>5</sup>

> <sup>1</sup> University of Twente, Enschede, The Netherlands {c.e.budde,a.hartmanns}@utwente.nl <sup>2</sup> Universidad Nacional de C´ordoba, C´ordoba, Argentina dargenio@famaf.unc.edu.ar <sup>3</sup> CONICET, C´ordoba, Argentina <sup>4</sup> Saarland University, Saarbr¨ucken, Germany <sup>5</sup> University of Waterloo, Waterloo, Canada sean.sedwards@uwaterloo.ca

**Abstract.** Statistical model checking avoids the state space explosion problem in verification and naturally supports complex non-Markovian formalisms. Yet as a simulation-based approach, its runtime becomes excessive in the presence of rare events, and it cannot soundly analyse nondeterministic models. In this tool paper, we present modes: a statistical model checker that combines fully automated importance splitting to efficiently estimate the probabilities of rare events with smart lightweight scheduler sampling to approximate optimal schedulers in nondeterministic models. As part of the Modest Toolset, it supports a variety of input formalisms natively and via the Jani exchange format. A modular software architecture allows its various features to be flexibly combined. We highlight its capabilities with an experimental evaluation across multi-core and distributed setups on three exemplary case studies.

### **1 Introduction**

Statistical model checking (SMC [30,49]) is a formal verification technique for stochastic systems. Using a formal stochastic model, specified as e.g. a continuoustime Markov chain (CTMC) or a stochastic Petri net (SPN), SMC can answer questions such as "what is the probability of system failure between two inspections" or "what is the expected time to complete a given workload". It is gaining popularity for complex applications where traditional exhaustive probabilistic model checking is limited by the state space explosion problem and by its inability to efficiently handle non-Markovian formalisms or complex continuous dynamics. At its core, SMC

c The Author(s) 2018 D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 340–358, 2018. https://doi.org/10.1007/978-3-319-89963-3\_20

This work is supported by the 3TU.BSR project, ERC grant 695614 (POWVER), the JST ERATO HASUO Metamathematics for Systems Design project (JPMJER1603), the NWO SEQUOIA project, and SeCyT-UNC projects 05/BP12 and 05/B497.

is the integration of classical Monte Carlo simulation with formal models. By only sampling concrete traces of the model's behaviour, its memory usage is effectively constant in the size of the state space, and it is applicable to any behaviour that can effectively be simulated.

The result of an SMC analysis is an *estimate* qˆof the actual quantity q together with a statistical statement on the potential error. A typical guarantee is that, with probability δ, any ˆq will be within ± of q. To strengthen such a guarantee, i.e. increase δ or decrease , more samples (that is, simulation runs) are needed. Compared to exhaustivemodel checking, SMC thus tradesmemory usage for accuracy or runtime. A particular challenge lies in *rare events*, i.e. behaviours of very low probability. Meaningful estimates need a small *relative* error: for a probability on the order of 10−<sup>19</sup>, for example, should reasonably be on the order of 10−<sup>20</sup>. In a standard Monte Carlo approach, this would require infeasibly many simulation runs.

SMC naturally works for formalisms with non-Markovian behaviour and complex continuous dynamics, such as generalised semi-Markov processes (GSMP) and stochastic hybrid Petri nets with many generally distributed transitions [42], for which the exact model checking problem is intractable or undecidable. As a simulation-based approach, however, SMC is incompatible with nondeterminism. Yet (continuous and discrete) nondeterministic choices are desirable in formal modelling for concurrency, abstraction, and to represent absence of knowledge. They occur in many formalisms such as Markov decision processes (MDP) or probabilistic timed automata (PTA [38]). In the presence of nondeterminism, quantities of interest are defined w.r.t. optimal *schedulers* (also called policies, adversaries or strategies) that resolve all nondeterministic choices: the verification result is the *maximum* or *minimum* probability or expected value ranging over *all* schedulers. Many SMC tools that appear to support nondeterministic models as input, e.g. Prism [37] and Uppaal smc [14], implicitly use a single hidden scheduler by resolving all choices randomly. Results are thus only guaranteed to lie *somewhere* between minimum and maximum. Such implicit resolutions are a known problem affecting the trustworthiness of simulation studies [36].

In this paper, we present a statistical model checker, modes, that addresses both of the above challenges: It implements *importance splitting* [45] to efficiently estimate the probabilities of rare events and *lightweight scheduler sampling* [39] to statistically approximate optimal schedulers. Both methods can be combined to perform rare event simulation for nondeterministic models.

*Rare Event Simulation.* The key challenge in rare event simulation (RES) is to achieve a high degree of automation for a general class of models. Current approaches to automatically derive the importance function for importance splitting, which is critical for the method's performance, are mostly limited to restricted classes of models and properties, e.g. [7,18]. modes combines several importance splitting techniques with the compositional importance function construction of Budde et al. [5] and two different methods to derive levels and splitting factors [4]. These method combinations apply to arbitrary stochastic models with a partly discrete state space. We have shown them to work well across different Markovian and non-Markovian automata- and dataflow-based formalisms [4].We present details on modes' support for RES in Sect. 3. Alongside Plasma lab [40], which implements automatic *importance sampling* [33] and semi-automatic importance splitting [32,34] for Markov chains (with APIs allowing for extensions to other models), modesis one of the most automated tools for RES on formal models today. In particular, we are not aware of any other tool that provides fully automated RES on general stochastic models.

*Nondeterminism.* Sound SMC for nondeterministic models is a hard problem. For MDP, Br´azdil et al. [3] proposed a sound machine learning technique to incrementally improve a partial scheduler. Uppaal Stratego [13] explicitly synthesises a "good" scheduler before using it for a standard SMC analysis. Both approaches suffer from worst-case memory usage linear in the number of states as all scheduler decisions must be stored explicitly. Classic memory-efficient sampling approaches like the one of Kearns et al. [35] address discounted models only. modesimplements the lightweight scheduler sampling (LSS) approach introduced by Legay et al. [39]. It is currently the only technique that applies to reachability probabilities and undiscounted expected rewards—as typically considered in formal verification that also keeps memory usage effectively constant in the number of states. Its efficiency depends only on the likelihood of sampling near-optimal schedulers. modes implements the existing LSS approaches for MDP [39] and PTA [10,26] and supports unbounded properties on Markov automata (MA [16]). We describe modes' LSS implementation in Sect. 4.

*The modes Tool.* modes is part of the Modest Toolset [24], which also includes the explicit-state model checker mcsta and the model-based tester motest [21]. It inherits the toolset's support for a variety of input formalisms, including the high-level process algebra-based Modest language [22] and xSADF [25], an extension of scenario-aware dataflow. Many other formalisms are supported via the Jani interchange format [6]. As simulation is easily and efficiently parallelisable, modes fully exploits multi-core systems, but can also be run in a distributed fashion across homogeneous or heterogeneous clusters of networked systems. We describe the various methods implemented to make modes a correct and scalable statistical model checker that supports classes of models ranging from CTMC to stochastic hybrid automata in Sect. 2. We focus on its software architecture in Sect. 5. Finally, Sect. 6 uses three very different case studies to highlight the varied kinds of models and analyses that modes can handle.

*Previous Publications.* modes was first described in a tool demonstration paper in 2012 [2]. Its focus was on the use of partial order and confluence reduction-based techniques [27] to decide on-the-fly if the nondeterminism in a model is spurious, i.e. whether maximum and minimum values are the same and an implicit randomised scheduler can safely be used. modes was again mentioned as a part of the Modest Toolset in 2014 [24]. Since then, modes has been completely redesigned. The partial order and confluence-based methods have been replaced by LSS, enabling the simulation of non-spurious nondeterminism; automated importance splitting has been implemented for rare event simulation; support for MA and a subset of stochastic hybrid automata (SHA [22]) has been added; and the statistical evaluation methods have been extended and improved. Concurrently, advances in the shared infrastructure of the Modest Toolset, now at version 3, provide access to new modelling features and formalisms as well as support for the Janispecification.

### **2 Ingredients of a Statistical Model Checker**

A statistical model checker performs a number of tasks to analyse a given formal model w.r.t. to a property of interest. In this section, we describe these tasks, their challenges, and how modes implements them. All random selections in an SMC tool are typically resolved by a *pseudo*-random number generator (PRNG). For brevity, we write "random" to mean "pseudo-random" in this section.

*Simulating Different Model Types.* The most basic task is *simulation*: the generation of random samples—*simulation runs*—from the probability distribution over behaviours defined by the model. modes contains simulation algorithms specifically optimised for the following types of models:


*Properties and Termination.* SMC computes a value for the property on every simulation run. A run is a finite trace; consequently, standard SMC only works for linear-time properties that can be decided on finite traces. modes supports


Transient queries may be time- and reward-bounded. A state formula is an expression over the (discrete and continuous) variables of the model without any temporal operators. A reward structure assigns a rate reward <sup>r</sup>(s) <sup>∈</sup> <sup>R</sup> to every state <sup>s</sup> and a branch reward <sup>r</sup>(b) <sup>∈</sup> <sup>R</sup> to every probabilistic branch <sup>b</sup> of every transition. An example transient query is "what is the probability to reach a destination (*goal*) within an energy budget (a reward bound) while avoiding collisions (*avoid*)". Expected reward queries allow asking for e.g. the expected number of retransmissions (the reward) until a message is successfully transmitted (*goal*) in a wireless network protocol. Every query q can be turned into a *requirement* <sup>q</sup> <sup>∼</sup> <sup>c</sup> by adding a comparison ∼∈ {≤, ≥ } to a constant value <sup>c</sup> <sup>∈</sup> <sup>R</sup>.

A simulation run ends when the value of a property is decided. For transient properties, this is the case when reaching an *avoid* state or a deadlock (value 0), or a *goal* state (value 1). To ensure termination, the probability of eventually encountering one of these events must be 1. modes additionally implements cycle detection: it keeps track of a configurable number n of previous visited states. When a run returns to a previous state without intermediate steps of probability <1, it will loop forever on this cycle and the run has value 0. modes uses n = 1 by default for good performance while still allowing models built for model checking, which avoid deadlocks but often contain terminal states with self-loops, to be simulated. For expected rewards, when entering a *goal* state, the property is decided with the value being the sum of the rewards along the run.

*Statistical Evaluation of Samples.* n simulation runs provide a sequence of independent values v1,...,v<sup>n</sup> for the property. ˆv<sup>n</sup> = <sup>1</sup> n n <sup>i</sup>=1 v<sup>i</sup> is an unbiased estimator of the actual probability or expected reward v. An SMC tool must stop generating runs at some point, and quantify the statistical properties of the estimate ˆv = ˆv<sup>n</sup> returned to the user. modes implements the following methods:


specify any two of , δ and n, out of which the missing value can be computed. The APMC method can be used as a hypothesis test for <sup>P</sup>(·) <sup>∼</sup> <sup>c</sup> by checking whether ˆv ≥ c + or ˆv ≤ c − , and returning undecided if neither is true.

– modes also implements Wald's **SPRT**, the sequential probability ratio test [47]. As a sequential hypothesis test, it has no predetermined n, but decides on-thefly whether more samples are needed as they come in. It is a test for Bernoullidistributed quantities, i.e. it only applies to transient requirements of the form <sup>P</sup>(·) <sup>∼</sup> <sup>c</sup>. For indifference level and error <sup>α</sup>, it stops when the collected samples so far provide sufficient evidence to decide between <sup>P</sup>(·) <sup>≥</sup> <sup>c</sup> <sup>+</sup> or <sup>P</sup>(·) <sup>≤</sup> <sup>c</sup> <sup>−</sup> with probability ≤α of wrongly accepting either hypothesis.

For a more detailed description of these and other statistical methods and especially hypothesis tests for SMC, we refer the interested reader to [44].

*Distributed Sample Generation.* Simulation is easily and efficiently parallelisable. Yet a na¨ıve implementation of the statistical evaluation—processing the values from the runs in the order they flow in—risks introducing a bias in a parallel setting. Consider estimating the probability of system failure when simulation runs that encounter failure states are shorter than other runs, and thus quicker. In parallel simulation, failure runs will tend to arrive earlier and more frequently, thus overestimating the probability of failure. To avoid such bias, modes uses the adaptive schedule first implemented in Ymer [48]. It adapts to differences in the speed of nodes by scheduling to process more future results from fast nodes when current results come in quickly. It always commits to a schedule *a priori* before the actual results arrive, ensuring the absence of bias. It is thus well-suited for heterogeneous clusters of machines with significant performance differences.

### **3 Automated Rare Event Simulation**

With the standard confidence of <sup>δ</sup> = 0.95, we have <sup>n</sup> <sup>≈</sup> <sup>0</sup>.37/<sup>2</sup> in the APMC method: for every decimal digit of precision, the number of runs increases by a factor of 100. If we attempt to estimate probabilities on the order of 10−<sup>4</sup>, i.e. <sup>≈</sup> <sup>10</sup>−<sup>5</sup>, we need billions of runs and days or weeks of simulation time. This is the problem tackled by rare event simulation (RES) techniques [45]. modes implements RES for transient properties via *importance splitting*, which iteratively increases the simulation effort for states "closer" to the goal set. Closeness is represented by an *importance function* <sup>f</sup><sup>I</sup> : <sup>S</sup> <sup>→</sup> <sup>N</sup> that maps each state in <sup>S</sup> to its importance in { 0,..., max f<sup>I</sup> }. The performance, but not the correctness, of all splitting methods hinges on the quality of the importance function.

*Deriving Importance Functions.* Traditionally, the importance function is specified ad hoc by a RES expert. Striving for usability by *domain* experts, modes implements the compositional importance function generation method of [5] that is applicable to any compositional stochastic model <sup>M</sup> <sup>=</sup> <sup>M</sup><sup>1</sup> ... <sup>M</sup><sup>n</sup> with a partly discrete state space. We write <sup>s</sup>|<sup>i</sup> for the projection of state <sup>s</sup> of <sup>M</sup> to the discrete local variables of component Mi. The method works as follows [4]:

**Fig. 1.** Illustration of Restart [4] **Fig. 2.** Illustration of fixed effort [4]


The method takes into account both the structure of the goal set formula and the structure of the state space. This is in contrast to the approach of J´egourel et al. [32], implemented in a semi-automated fashion in Plasma lab [34,40], that only considers the structure of the (more complex linear-time) logical property. The memory usage of the compositional method is determined by the number of discrete local states (required to be finite) over all components. Typically, component state spaces are small even when the composed state space explodes.

*Levels and Splitting Factors.* We also need to specify *when* and *how much* to "split", i.e. increase the simulation effort. For this purpose, the values of the importance function are partitioned into *levels* and a *splitting factor* is chosen for each level. Splitting too much too often will degrade performance (oversplitting), while splitting too little will cause starvation, i.e. few runs that reach the rare event. It is thus critical to choose good levels and splitting factors. Again, to avoid the user having to make these choices ad hoc, modes implements two methods to compute them automatically. One is based on the sequential Monte Carlo splitting technique [8], while the other method, named *expected success* [4], has been newly developed for modes. It strives to find levels and factors that lead to one run moving up from one level to the next in the expectation.

*Importance Splitting Runs.* The derivation of importance function, levels and splitting factors is a preprocessing step. Importance splitting then replaces the simulation algorithm by a variant that takes this information into account to more often encounter the rare event. modes implements three importance splitting techniques: Restart, fixed effort and fixed success.

Restart [46] is illustrated in Fig. 1: As soon as a Restart run crosses the threshold into a higher level, n-−1 new child runs are started from the first state in the new level, where n is the splitting factor of level . When a run moves below its creation level, it ends. It also ends on reaching an *avoid* or *goal* state. The result of a Restart run—consisting of a main and several child runs—is the number of runs that reach *goal* times 1/ n-, i.e. a rational number ≥ 0.

Runs of the *fixed effort* method [17,19], illustrated in Fig. 2, are rather different. They consist of a fixed number of partial runs on each level, each of which ends when it crosses into the next higher level or encounters a *goal* or *avoid* state. When all partial runs for a level have ended, the next round starts from the previously encountered initial states of the next higher level. When a fixed effort run ends, the fraction of partial runs started in a level that moved up approximates the conditional probability of reaching the next level given that the current level was reached. If *goal* states exist only on the highest level, the overall result is the product of all of these fractions, i.e. a rational number in [0, 1].

Fixed success [1] is a variant of fixed effort that generates partial runs until a fixed number of them have reached the next higher level. For all three methods, the average of the result of many runs is again an unbiased estimator for the probability of the transient property [19]. However, each run is no longer a Bernoulli trial. Of the statistical evaluation methods offered by modes, only CI with normal confidence intervals is thus applicable. For a deeper discussion of the challenges in the statistical evaluation of rare event simulation results, we refer the interested reader to [43]. To the best of our knowledge, modes is today the most automated rare event simulator for general stochastic models. In particular, it defaults to the combination of Restart with the expected success method for level calculation, which has shown consistently good performance [4].

### **4 Scheduler Sampling for Nondeterminism**

Resolving nondeterminism in a randomised way leads to estimates that only lie *somewhere* between the desired extremal values. In addition to computing probabilities or expected rewards, we also need to find a (near-)optimal scheduler.

*Lightweight Scheduler Sampling.* modes implements the lightweight scheduler sampling (LSS) approach for MDP of [39] that identifies a scheduler by a single integer (typically of 32 bits). This allows to randomly select a large number m of schedulers (i.e. integers), perform standard or rare event simulation for each, and report the maximum and minimum estimates over all sampled schedulers as approximations of the actual extremal values. We show the core of the lightweight approach—performing a simulation run for a given scheduler identifier σ—for MDP and transient properties as Algorithm 1. An MDP consists of a countable set of states S, a transition function T that maps each state to a finite *set* of probability distributions over successor states, and an initial state **Input:** MDP -S, T, s0, transient property <sup>φ</sup>, scheduler id <sup>σ</sup> <sup>∈</sup> <sup>Z</sup>

**1** s := s0, π := s<sup>0</sup> **<sup>2</sup> while** <sup>φ</sup>(π) = undecided **do <sup>3</sup>** Und.initialise(H(σ.s)) // use hash of σ and s as seed for Und **<sup>4</sup> if** <sup>T</sup>(s) = <sup>∅</sup> **then return** false // end of run due to deadlock **<sup>5</sup>** μ := Und · |T(s)|-th element of T(s) // use Und to select transition **6** s- := μ ◦ Upr.next() // use Upr to select successor state according to μ **7** π := π.s- , s := s- // append s to π and continue from s- **<sup>8</sup> return** <sup>φ</sup>(π)

**Algorithm 1.** Simulation for an MDP and a fixed scheduler id [10]

s0. The algorithm uses two PRNG: Upr to simulate the probabilistic choices (line 6), and Und to resolve the nondeterministic ones (line 5). We want σ to represent a deterministic memoryless scheduler: within one simulation run as well as in different runs for the same value of σ, Und must always make the same choice for the same state s. To achieve this, Und is re-initialised with a seed based on σ and s in every step (line 3). The overall effectiveness of the lightweight approach only depends on the likelihood of selecting a σ that represents a (near-)optimal scheduler. We want to sample "uniformly" from the space of all schedulers to avoid actively biasing against "good" schedulers. Algorithm 1 achieves this naturally for MDP.

*Beyond MDP.* LSS can be adapted to any model and type of property where the class of optimal schedulers only uses *discrete* input to make its decision for every state [26]. This is obviously the case for discrete-space discrete-time models like MDP. It means that LSS can directly be applied to MA and *timeunbounded* properties, too. In addition to MDP and MA, modes also supports two LSS methods for PTA, based on a variant of forwards reachability with zones [10] and the region graph abstraction [26], respectively. While the former includes zone operations with worst-case runtime exponential in the number of clocks, the latter implements all operations in linear time. It exploits a novel data structure for regions based on representative valuations that performs very well in practice [26]. Extending LSS to models with general continuous probability distributions such as stochastic automata [11] is hindered by optimal schedulers requiring non-discrete information (the values and expiration times of all clocks [9]). modes currently provides prototypical LSS support for SA encoded in a particular form and various restricted classes of schedulers as described in [9].

*Bounds and Error Accumulation.* The results of an SMC analysis with LSS are lower bounds for maximum and upper bounds for minimum values up to the specified statistical error and confidence. They can thus be used to e.g. *disprove* safety (the maximum probability to reach an unsafe state is above a threshold) or *prove* schedulability (there is a scheduler that makes it likely to complete the workload in time), but not the opposite. The accumulation of statistical error introduced by the repeated simulation experiments over multiple schedulers must also be accounted for. [12] shows how to modify the APMC method accordingly and turn the SPRT into a correct sequential test *over schedulers*. In addition to these, modes allows the CI method to be used with LSS by applying the standard Sid´ ˇ ak correction for multiple comparisons. This enables LSS for expected rewards and RES. All the adjustments essentially increase the required confidence depending on the (maximum) number of schedulers to be sampled.

*Two-Phase and Smart Sampling.* If an SMC analysis for fixed statistical parameters would need n runs on a deterministic model, it will need significantly more than m · n runs for a nondeterministic model when m schedulers are sampled due to the increase in the required confidence. modes implements a two-phase approach and smart sampling [12] to reduce this overhead. The former's first phase consists of performing n simulation runs for each of the m schedulers. The scheduler that resulted in the maximum (or minimum) value is selected, and independently evaluated once more with n runs to produce the final estimate. The first phase is a heuristic to find a near-optimal scheduler before the second phase estimates the value under this scheduler according to the required statistical parameters. Smart sampling generalises this principle to multiple phases, dropping only the "worst" *half* of the evaluated schedulers between phases. It starts with an informed guess of good initial values for n and m. For details, see [12]. Smart sampling tends to find more extremal schedulers faster while the two-phase approach has predictable performance as it always needs (m + 1) · n runs. We thus use the two-phase approach for our experiments in Sect. 6.

### **5 Architecture and Implementation**

modes is implemented in C# and works on Linux, Mac OS X and Windows systems. It builds on a solid foundation of shared infrastructure with other tools of the Modest Toolset. This includes input language parsers that map Modest, xSADF and Jani input into a common internal metamodel for networks of stochastic hybrid automata with rewards and discrete variables. Before simulation, every model is compiled to bytecode, making the metamodel executable. The same compilation engine is also used by the mcsta and motest tools.

The architecture of the SMC-specific part of modes is shown as a class diagram in Fig. 3. Boxes represent classes, with rounded rectangles for abstract classes and invisible boxes for interfaces. Solid lines are inheritance relations. Dotted lines are associations, with double arrows for collection associations. The architecture mirrors the three distinct tasks of a statistical model checker: the generation of individual simulation runs and per-run evaluation of properties, implemented in modes by *RunGenerator* and *RunEvaluator*, respectively; the coordination of simulation over multiple threads across CPU cores and networked machines, implemented by classes derived from *Worker* and *IWorkerHost*; and the statistical evaluation of simulation runs, implemented by *PropertyEvaluator*.

The central component of modes' architecture is the *Master*. It compiles the model, derives the importance function, sends both to the workers (on the same

**Fig. 3.** The software architecture of the modes statistical model checker

or different machines), and instantiates a *PropertiesJob* for every partition of the properties to be analysed that can share simulation runs.<sup>1</sup> Each *PropertiesJob* then posts simulation jobs back to the master in parallel or in sequence. A simulation job is a description of how to generate and evaluate runs: which run type (i.e. *RunGenerator* derived class) to use, whether to wrap it in an importance

<sup>1</sup> Using the same set of runs for multiple properties is an optimisation at the cost of statistical independence. modes can also generate independent runs for each property.

splitting method, whether to simulate for a specific scheduler id, which compiled expressions to evaluate to determine termination and the values of the runs, etc. The master allocates posted jobs to available simulation threads offered by the workers, and notifies workers when a job is scheduled for one of their threads. As the result for an individual run is handed from the *RunEvaluator* by the *RunGenerator* via the workers to the master, it is fed into a *Sequentialiser* that implements the adaptive schedule for bias avoidance. Only after that, possibly at a later point, is it handed on to the *PropertiesJob* for statistical evaluation.

For illustration, consider a *PropertiesJob* for LSS with 10 schedulers, RES with Restart, and the expected success method for level calculation. It is given the importance function by the master, and its first task is to compute the levels. It posts a simulation job for fixed effort runs with level information collection to the master. Depending on the current workload from other *PropertiesJob*s, the master will allocate many threads to this job. Once enough results have come in, the *PropertiesJob* terminates the simulation job, computes the levels and splitting factors, and starts with the actual simulations: It selects 10 random scheduler identifiers and concurrently posts for each of them a simulation job for Restart runs. The master will try to allocate available threads evenly over these jobs. As results come in, the evaluation may finish early for some schedulers, at which point the master will be instructed to stop the corresponding simulation job. It can then allocate the newly free threads to other jobs. This scheme results in a maximal exploitation of the available parallelism across workers and threads.

Due to the modularity of this architecture, it is easy to extend modes in different ways. For example, to support a new type of model (say, non-linear hybrid automata) or a new RES method, only a new *(I)RunGenerator* needs to be implemented. Adding another statistical evaluation method from [44] means adding a new *PropertyEvaluator*, and so on.

In distributed simulation, an instance of modes is started on each node with the --server parameter. This results in the creation of an instance of the *Server* class instead of a *Master*, which listens for incoming connections. Once all servers are running, a master can be started with a list of hosts to connect to. modes comes with a template script to automate this task on slurm-based clusters.

### **6 Experiments**

We present three case studies in this section. They have been chosen to highlight modes' capabilities in terms of the diverse types of models it supports, its ability to distribute work across compute clusters, and the new analyses possible with RES and LSS. None of them has been studied before with modes or the combinations of methods that we apply here. Our experiments ran on an Intel Core i7-4790 workstation (3.6–4.0 GHz, 4 cores), a homogeneous cluster of 40 AMD Opteron 4386 nodes (3.1–3.8 GHz, 8 cores), and an inhomogeneous cluster of 15 nodes with different Intel Xeon processors. All systems run 64-bit Linux. We use 1, 2 or 4 simulation threads on the workstation (denoted "1", "2" and "4" in our tables), and n nodes with t simulation threads each on the clusters (denoted


**Table 1.** Performance and scalability on the electric vehicle charging case study

"n × t"). We used a one-hour timeout, marked "—" in the tables. Note that runtimes cannot directly be compared between the workstation and the clusters.

*Electric Vehicle Charging.* We first consider a model of an electric vehicle charging station. It is a Modest model adapted from the "extended" case study of [42]: a stochastic hybrid Petri net with general transitions, which in turn is based on the work in [31]. The scenario we model is of an electric vehicle being connected to the charger every evening in order to be recharged the next morning. The charging process may be delayed due to high load on the power grid, and the exact time at which the vehicle is needed in the morning follows a normal distribution. We consider one week of operation and compute the probability that the desired level of charge is not reached on any *nfail* ∈ { 2,..., 5 } of the seven mornings.

This model is not amenable to exhaustive model checking due to the non-Markovian continuous probability distributions and the hybrid dynamics modelling the charging process. However, it is deterministic. We thus applied modes with standard Monte Carlo simulation (MC) as well as with RES using Restart. We performed the same analysis on different configurations of the workstation and the homogeneous cluster. To compare MC and RES, we use CI with δ = 0.95 and a relative half-with of 10 % for both. All other parameters of modes are set to default values, which implies an automatic compositional importance function and the expected success method to determine levels and splitting factors. The results are shown in Table 1. Row "conf. interval" gives the average confidence intervals that we obtained over all experiments.

RES starts to noticeably pay off as soon as probabilities are on the order of 10−<sup>4</sup>. The runtime of Restart is known to heavily depend on the levels and splitting factors, and we indeed noticed large variations in runtime for RES over several repetitions of the experiments. The runtimes for RES should thus not be used to judge the speedup w.r.t. parallelisation. However, when looking at the MC runtimes, we see good speedups as we increase the number of threads per node, and near-ideal speedups as we increase the total number of nodes, as long as there is a sufficient amount of work.


**Table 2.** Performance and results for the low-latency wireless network case study

Although this model was not designed with RES in mind and has only moderately rare events, the fully automated methods of modes could be applied directly, and they significantly improved performance. For a detailed experimental comparison of the RES methods implemented in modes on a larger set of examples, including events with probabilities as low as 4.<sup>8</sup> · <sup>10</sup>−<sup>23</sup>, we refer the reader to [4].

*Low-latency Wireless Networks.* We now turn to the PTA model of a low-latency wireless networking protocol being used among three stations, originally presented in [15]. We take the original model, increase the probability of message loss, and make one of the communication links nondeterministically drop messages. This allows us to study the influence of the message loss probabilities and the protocol's robustness to adversarial interference. The model *is* amenable to model checking, as demonstrated in [15]. It allows us to show that modes can be applied to such models originally built for traditional verification, and since we can calculate the precise maximum and minimum values of all properties via model checking, we have a reference to evaluate the results of LSS.

We show the results of using modes with LSS on this model in Table 2. Row "optimal" lists the maximum and minimum probabilities computed via model checking for three properties: the probability that the protocol fails within four iterations, and that either the first or the second station goes offline. We used the two-phase LSS method with m = 100 schedulers on the workstation, and with m = 1000 schedulers on the homogeneous cluster. The intervals are the averages of the min. and max. values returned by all analyses. The statistical evaluation is APMC with δ = 0.95 and = 0.0025, which means that 59556 simulation runs are needed per scheduler.

Near-optimal schedulers for the minimum probabilities do not appear to be rare: we find good bounds for the minima even with 100 schedulers. However, for maximum probabilities, sampling more schedulers pays off in terms of better approximations. In all cases, the results are conservative approximations of the actual optima (as expected), and they are clearly more useful than the single value that would be obtained by other tools via a (hidden) randomised scheduler. Performance scales ideally with parallelism on the cluster, and still linearly on the workstation. For a deeper evaluation of the characteristics of LSS, including experiments on models too large for model checking, we refer the reader to the description of the original approach [12,39] and its extensions to PTA [10,26].


**Table 3.** Performance and results for the reliable database system case study

*Redundant Database System.* The redundant database system [20] is a classic RES case study. It models a system consisting of six disk clusters of *R* + 2 disks each plus two types of processors and disk controllers with *R* copies of each type. Component lifetimes are exponentially distributed. Components fail in one of two modes with equal probability, each mode having a different repair rate. The system is operational as long as fewer than R processors of each type, R controllers of each type, and R disks in each cluster are currently failed. The model is a CTMC with a state space too large and a transition matrix too dense for it to be amenable to model checking with symbolic tools like Prism [37].

In the original model, any number of failed components can be repaired in parallel. We consider this unrealistic, and extend the model by a *repairman* that can repair a single component at a time. If more than one component fails during a repair, then as soon as the current repair is finished, the repairman has to decide which to repair next. Instead of enforcing a particular repair policy, we leave this decision as nondeterministic. The model thus becomes an MA. We use LSS in combination with RES to investigate the impact of the repair policy. We study the scenario where one *component* of each kind (one disk, one processor, one controller) is in failed state, and estimate the probability for *system* failure before these components are repaired. The minimum probability is achieved by a perfect repair strategy, while the maximum results from the worst possible one.

Table 3 shows the results of our LSS-plus-RES analysis with modes using default RES parameters and sampling m = 20 schedulers. Due to the complexity of the model, we ran this experiment on the inhomogeneous cluster only, using 16 cores on each node for 240 concurrent simulation threads in total. We see that RES needs a somewhat rare event to improve performance. We also compare LSS to the uniform randomised scheduler (as implemented in many other SMC tools). It results in a single confidence interval for the probability of failure. With LSS, we get two intervals. They do not overlap when R ≥ 3, i.e. the repair strategy matters: a bad strategy makes failure approximately twice as likely as a good strategy! Since the results of LSS are conservative, the difference between the worst and the best strategy may be even larger.

*Experiment Replication.* To enable independent replication of our experimental results, we have created a publicly available evaluation artifact [23]. It contains the version of modes and the model files used for our experiments, the raw experimental results, summarising tabular views of those results (from which we derived Tables 1, 2 and 3), and a Linux shell script to automatically replicate a subset of the experiments. Since the complete experiments take several hours to complete and require powerful hardware and computer clusters, we have selected a subset for the replication script. Using the virtual machine of the TACAS 2018 Artifact Evaluation [28] on typical workstation hardware of 2017, it runs to completion in less than one hour while still substantiating our main results.

### **7 Conclusion**

We presented modes, the Modest Toolset's distributed statistical model checker. It provides methods to tackle both of the prominent challenges in simulation: nondeterminism and rare events. Its modular software architecture allows its various features to be easily combined and extended. For the first time, we used lightweight scheduler sampling with Markov automata, and combined it with rare event simulation to gain insights into a challenging case study that, currently, cannot be analysed for the same aspects with any other tool that we are aware of. modes is available for download at www.modestchecker.net.

**Acknowledgments.** The authors thank Carina Pilch and Sebastian Junges for their support with the vehicle charging and wireless networks case studies.

**Data Availability.** The data generated in our experimental evaluation is archived and available at DOI 10.4121/uuid:64cd25f4-4192-46d1-a951-9f99b452b48f [23].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Temporal Logic and Mu-calculus

# **Permutation Games for the Weakly Aconjunctive** *µ***-Calculus**

Daniel Hausmann(B), Lutz Schr¨oder(B) , and Hans-Peter Deifel

> Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany {daniel.hausmann,lutz.schroeder}@fau.de

**Abstract.** We introduce a natural notion of limit-deterministic parity automata and present a method that uses such automata to construct satisfiability games for the weakly aconjunctive fragment of the μ-calculus. To this end we devise a method that determinizes limit-deterministic parity automata of size n with k priorities through limit-deterministic B¨uchi automata to deterministic parity automata of size O((nk)!) and with O(nk) priorities. The construction relies on limit-determinism to avoid the full complexity of the Safra/Pitermanconstruction by using partial permutations of states in place of Safra-Trees. By showing that limit-deterministic parity automata can be used to recognize unsuccessful branches in pre-tableaux for the weakly aconjunctive μ-calculus, we obtain satisfiability games of size O((nk)!) with O(nk) priorities for weakly aconjunctive input formulas of size n and alternation-depth k. A prototypical implementation that employs a tableau-based global caching algorithm to solve these games on-the-fly shows promising initial results.

### **1 Introduction**

The modal μ-calculus [15] is an expressive logic for reasoning about concurrent systems. Its satisfiability problem is ExpTime-complete [5]. Due to nesting of fixpoints, the semantic structure of the μ-calculus is quite involved, which is reflected in the high degree of sophistication of reasoning algorithms for the μ-calculus. One convenient modular approach is the definition of suitable *satisfiability games* (e.g. [10]); solving such games (i.e. computing their winning regions) then amounts to deciding the satisfiability of the input formulas. A standard method for obtaining satisfiability games is to first construct a *tracking automaton* that accepts the *bad branches* in a pre-tableau for the input formula, i.e. those that infinitely defer satisfaction of a least fixpoint; this automaton then is determinized and complemented, and the satisfiability game is built over the carrier set of the resulting automaton. The moves in the game are those transitions from the automaton that correspond to applications of tableau-rules; the existence of a winning strategy in this game ensures the existence of a model, i.e. a locally coherent structure that does not contain bad branches. As they typically incur exponential blowup, good determinization procedures for automata on infinite words play a crucial role in standard decision procedures for the satisfiability problem of the μ-calculus and its fragments; in particular, better determinization procedures lead to smaller satisfiability games which are easier to solve.

The *weakly aconjunctive* μ-calculus [15,24] restricts occurrences of recursion variables in conjunctions but is still quite expressive, e.g. can define winning regions in parity games with bounded number of priorities [4]. The key observation for the present paper is that in the weakly aconjunctive case, pre-tableau branches are made 'bad' by a single formula; this implies that the tracking automaton for such formulas is *limit-deterministic*, i.e. that it is sufficient to deterministically track a single formula from some point on. This motivates a notion of *limit-deterministic parity automata* in which all accepting runs are deterministic from some point on. Because the nondeterminism is restricted to finite prefixes of accepting runs in such automata, they can be determinized in a simpler way than unrestricted parity automata. We present a reformulation of a recent determinization method for limit-deterministic *B¨uchi* automata [6]. The method is inspired by, but significantly less involved than the more general Safra/Piterman construction [19,20], essentially due to the fact that the tree structure of Safra trees collapses, leaving only the permutation structure. The resulting parity automaton can thus be described as a *permutation automaton*. The method yields deterministic parity automata with <sup>O</sup>(n!) states, compared to <sup>O</sup>((n!)<sup>2</sup>) in the Safra/Piterman construction. Crucially, we show that we obtain a similarly simplified determinization for limit-deterministic *parity* automata by translating into B¨uchi automata.

As indicated above, limit-deterministic parity automata are able to recognize bad branches in pre-tableaux for weakly aconjunctive μ-calculus formulas. Employing them in the standard construction of satisfiability games, we obtain *permutation games* in which nodes from the pre-tableau are annotated with a partial permutation (i.e. a non-repetitive list) of (levelled) formulas. A parity condition is used to detect indices in the permutation that are active infinitely often without ever being removed from the permutation. The resulting parity games are of size <sup>O</sup>((nk)!) and have <sup>O</sup>(nk) priorities; as a side result, we thus obtain a new bound <sup>O</sup>((nk)!) on model size for weakly aconjunctive formulas.

The resulting decision procedure generalizes to the weakly aconjunctive *coalgebraic* μ-calculus, thus covering also, e.g., probabilistic and alternatingtime versions of the μ-calculus. The generic algorithm has been implemented as an extension of the *Coalgebraic Ontology Logic Reasoner* (COOL) [11,13]. Our implementation constructs and solves the presented permutation games *onthe-fly*, possibly finishing satisfiability proofs early, and shows promising initial results. The content of the paper is structured as follows: We describe the determinization of limit-deterministic automata in Sect. 2 and the construction of permutation games in Sect. 3, and discuss implementation and evaluation in Sect. 4.

**Related Work.** Liu and Wang [17] give a tighter estimate <sup>O</sup>((n!)<sup>2</sup>) for the number of states in Piterman's determinization [19]. Schewe [21] simplifies Piterman's construction (establishing the same bound as Liu and Wang). Tian and Duan [23] further improve Schewe's construction. Fisman and Lustig [7] present a modularization of B¨uchi determinization that is aimed mainly at easing understanding of the construction. Parity automata can be determinized by first converting them to B¨uchi automata and then applying B¨uchi determinization. Schewe and Varghese [22] address the direct determinization of parity automata (via Rabin automata), and prove optimality within a small constant factor, and even absolute optimality for the B¨uchi subcase. All these constructions and estimates concern unrestricted B¨uchi or parity automata. Recently, Safra-less determinization of limit-deterministic B¨uchi automata has been described in the context of controller synthesis for LTL [6]; the determinization method that we present in Sect. 2.2. has been devised independently from [6] but employs a very similar construction (yielding essentially the same results on the complexity of the construction).

The use of games in μ-calculus satisfiability checking goes back to Niwi´nski and Walukiewicz [18] and has since been extended to the unguarded μcalculus [10] and the *coalgebraic* μ-calculus [2]. Game-based procedures for the relational μ-calculus have been implemented in MLSolver [9], and for the alternation-free coalgebraic μ-calculus in COOL [13].

### **2 Determinizing Limit-Deterministic Automata**

#### **2.1 Limit-Deterministic Automata**

We recall the basics of parity automata: A *parity automaton* is a tuple A = (V, Σ, δ, u0, α) where V is a set of *states*, Σ is an *alphabet*, δ <sup>⊆</sup> V <sup>×</sup>Σ×V is a *transition relation*, <sup>u</sup><sup>0</sup> <sup>∈</sup> <sup>V</sup> is an *initial state*, and <sup>α</sup> : <sup>δ</sup> <sup>→</sup> <sup>N</sup> is a *priority function* that assigns natural numbers to *transitions* (assigning priorities to transitions rather than states yields a slightly more succinct notion of automata while retaining the computational properties of standard parity automata [22]). For (v, a) <sup>∈</sup> V <sup>×</sup> Σ, we write δ(v, a) = {u <sup>|</sup> (v, a, u) <sup>∈</sup> δ}. The *index* idx(A) = max{α(t) <sup>|</sup> t <sup>∈</sup> δ} of a parity automaton <sup>A</sup> is its maximal priority. A *run* <sup>ρ</sup> <sup>=</sup> <sup>v</sup><sup>0</sup>v<sup>1</sup> ... of <sup>A</sup> on an infinite word <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>0</sup>a<sup>1</sup> ... <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> starting at <sup>v</sup> <sup>∈</sup> <sup>V</sup> is a (possibly infinite) sequence of states <sup>v</sup><sup>i</sup> such that <sup>v</sup><sup>0</sup> <sup>=</sup> <sup>v</sup> and for all <sup>i</sup> <sup>≥</sup> 0, <sup>v</sup><sup>i</sup>+1 <sup>∈</sup> <sup>δ</sup>(v<sup>i</sup>, a<sup>i</sup>). We see runs <sup>ρ</sup> or words <sup>w</sup> as functions from natural numbers to states <sup>ρ</sup>(i) = <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> or letters <sup>w</sup>(i) = <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>. For a run <sup>ρ</sup> on a word <sup>w</sup>, we define the according sequence trans(ρ) of transitions by trans(ρ)(i)=(ρ(i), w(i), ρ(i+ 1)). We denote the set of all runs of <sup>A</sup> on a word w starting at v by run(A, v, w), or just by run(A, w) if v <sup>=</sup> u<sup>0</sup>. A run <sup>ρ</sup> of <sup>A</sup> on a word <sup>w</sup> is *accepting* if the highest priority that occurs infinitely often in it (notation: max(Inf(α ◦ trans(ρ))); we generally write Inf(s) for the set of elements occurring infinitely often in a sequence s) is even. A parity automaton <sup>A</sup> *accepts* an infinite word w if run(A, w) contains an accepting run, and we denote by L(A) <sup>⊆</sup> Σ<sup>ω</sup> the set of all words that are accepted by <sup>A</sup>.

Given a state <sup>v</sup> <sup>∈</sup> <sup>V</sup> and a letter <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>, we define <sup>δ</sup>|v,a <sup>=</sup> {(v, a, u) <sup>|</sup> u <sup>∈</sup> δ(v, a)}. Given a set γ <sup>⊆</sup> δ of transitions, a state v <sup>∈</sup> V , a set of states U <sup>⊆</sup> V and a letter a <sup>∈</sup> Σ, we put γ(U, a) = -{γ(v, a) <sup>|</sup> v <sup>∈</sup> U}; given a finite word <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>0</sup> ...an, we then recursively define <sup>γ</sup>(v, w) = <sup>γ</sup>(γ(v, a<sup>0</sup>), a<sup>1</sup> ...an), obtaining the set of all states reachable from v when reading w while only using transitions from γ. For U <sup>⊆</sup> V , γ <sup>⊆</sup> δ and w <sup>∈</sup> Σ∗, we put <sup>γ</sup>(U, w) = -{γ(u, w) <sup>|</sup> u <sup>∈</sup> U}. Furthermore, we define the set of states that are *reachable* from a node v <sup>∈</sup> V using transitions from γ as reachγ(v) = -{γ(v, w) <sup>|</sup> w <sup>∈</sup> Σ∗}; we extend this notation to sets of nodes, putting reachγ(U) = -{reachγ(u) <sup>|</sup> u <sup>∈</sup> U} for U <sup>⊆</sup> V . If γ <sup>=</sup> δ, then we omit the subscripts. A state v <sup>∈</sup> V is said to be *deterministic* (in γ <sup>⊆</sup> δ) if it has at most one (γ-)successor for each letter a <sup>∈</sup> Σ. A set U <sup>⊆</sup> V is deterministic (in γ <sup>⊆</sup> δ) if every state v <sup>∈</sup> U is deterministic (in γ). The automaton <sup>A</sup> is said to be *deterministic* if V is deterministic; the transition relation in deterministic automata hence is a partial function (since such automata can be transformed to equivalent automata with total transition function, this definition suffices for purposes of determinization). We put α(i) = {t <sup>∈</sup> δ <sup>|</sup> α(t) = i} and α<sup>≤</sup>(i) = {<sup>t</sup> <sup>∈</sup> <sup>δ</sup> <sup>|</sup> <sup>α</sup>(t) <sup>≤</sup> <sup>i</sup>}.

A *B¨uchi automaton* is a parity automaton with only the priorities 1 and 2; the set of *accepting transitions* then is F <sup>=</sup> α(2) and a run is accepting if it passes infinitely many accepting transitions. For B¨uchi automata, we assume w.l.o.g. that every transition t <sup>∈</sup> F is part of a cycle. We use the abbreviations (N/D)PA, (N/D)BA to denote the different types of automata.

Our notion of limit-determinism of automata is defined as a semantic property:

**Definition 1 (Limit-deterministic parity automata).** A PA A = (V, Σ, δ, u0, α) is *limit-deterministic* if there is, for each word w and each accepting run <sup>ρ</sup> <sup>∈</sup> run(A, w), a number <sup>i</sup> such that for all <sup>j</sup> <sup>≥</sup> <sup>i</sup>, <sup>δ</sup>|ρ(j),w(j) <sup>∩</sup> <sup>α</sup><sup>≤</sup>(l) = {trans(ρ)(j)}, where l = max(Inf(α ◦ trans(ρ))).

If <sup>A</sup> is a BA, then we have max(Inf(α ◦ trans(ρ))) = 2 for every accepting run <sup>ρ</sup>; as <sup>α</sup><sup>≤</sup>(2) = δ, the above definition instantiates to requiring the existence of a number i such that for all j <sup>≥</sup> i, δ(ρ(j), w(j)) = {ρ(j + 1)}.

**Definition 2 (Compartments).** Given a PA <sup>A</sup> = (V, Σ, δ, u<sup>0</sup>, α) with k priorities, and an even number l <sup>≤</sup> k, the l*-compartment* C<sup>l</sup>(t) of a transition <sup>t</sup> <sup>∈</sup> <sup>α</sup>(l) is the set reach<sup>α</sup>≤(l)(π<sup>3</sup>(t)) where <sup>π</sup><sup>3</sup> projects transitions <sup>t</sup> = (v, a, u) to their target nodes u. If l is irrelevant, then we refer to l-compartments just as compartments. The *size* of a compartment C is just <sup>|</sup>C|. A compartment C is *internally deterministic* if for each v <sup>∈</sup> C and all a <sup>∈</sup> Σ, <sup>|</sup>δ(v, a) <sup>∩</sup> C| ≤ 1.

Note that the union of all <sup>l</sup>-compartments is reach<sup>α</sup>≤(l)(π<sup>3</sup>[α(l)]). Compartments allow for a syntactic characterization of limit-determinism:

**Lemma 3.** *A PA is limit-deterministic if and only if all its compartments are internally deterministic.*

**Corollary 4.** *It is decidable in polynomial time whether a given automaton is limit-deterministic.*

Lemma <sup>3</sup> specializes to BA as follows: we have <sup>α</sup>(0) = <sup>∅</sup>, <sup>α</sup>≤(2) = δ and α(2) = F, so that the union of all 0-compartments is empty and that of all 2-compartments is reach(π<sup>3</sup>[F]); thus a BA is limit-deterministic if and only if reach(π<sup>3</sup>[F]) is deterministic. Such B¨uchi automata are also called *semideterministic* [3].

#### **2.2 Determinizing Limit-Deterministic B¨uchi Automata**

The Safra/Piterman construction [19,20] determinizes B¨uchi automata by means of so-called Safra trees, i.e. trees whose nodes are labelled with sets of states of the input automaton such that the label of a node is a proper superset of the union of all its children's labels. Additionally, the nodes are ordered by their age and upon each transition between Safra trees, the ages of the oldest nodes that are active and/or removed during this transition determine the priority of the new Safra tree. In its original formulation, the Safra/Piterman construction adds new child nodes to the graph that are labelled with the accepting states in their parent's label. We observe that this step can be modified slightly – without affecting the correctness of the construction – by letting every accepting state from the parent's label receive its own separate child node; then the labels of newly created nodes are always singletons. Limit-determinism of the input automaton then implies that the node labels also *remain* singletons. Since singleton nodes do not have children in Safra trees, this leads to the collapse of their tree structure; the resulting data structure is essentially a partial permutation, i.e. a non-repetitive list, of states (ordered by their age). The arising modified Safra/Piterman construction for the limit-deterministic case boils down to the following method, which (a) has a relatively short presentation and a simpler correctness proof than the full Safra/Piterman construction, and (b) results in asymptotically smaller automata; the underlying idea of the construction has first been described in the context of controller synthesis for LTL [6].

**Definition 5 (Partial permutations).** Given a set U of states, let pperm(U) denote the set of *partial permutations* over U, i.e. the set of non-repetitive lists <sup>l</sup> = [v<sup>1</sup>,...,v<sup>n</sup>] with <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>j</sup> for <sup>i</sup> <sup>=</sup> <sup>j</sup> and <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>U</sup>, for all 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>. We denote the i-th element in l by l(i) = v<sup>i</sup>, the empty partial permutation by [ ] and the length of a partial permutation l by <sup>|</sup>l|.

**Definition 6 (Determinization of limit-deterministic BA).** Fix a limitdeterministic BA <sup>A</sup> = (V, Σ, δ, u<sup>0</sup>, F), and put <sup>Q</sup> <sup>=</sup> reach(π<sup>3</sup>[F]), <sup>Q</sup> <sup>=</sup> <sup>V</sup> \ <sup>Q</sup>, q <sup>=</sup> <sup>|</sup>Q|. Define the DPA <sup>B</sup> = (W, Σ, δ , w<sup>0</sup>, α) by putting W <sup>=</sup> <sup>P</sup>(Q)×pperm(Q), <sup>w</sup><sup>0</sup> = ({u<sup>0</sup>}, [ ]) if <sup>u</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup>, <sup>w</sup><sup>0</sup> = (∅, [u<sup>0</sup>]) if <sup>u</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> and for <sup>g</sup> = (U, l) <sup>∈</sup> <sup>W</sup> and a <sup>∈</sup> Σ, δ (g, a) = h, where h = (δ(U, a) <sup>∩</sup> Q, l ) and where l is constructed from l = [v<sup>1</sup>,...,v<sup>m</sup>] as follows:

1. Define a list t of length m over Q ∪ {∗} (with <sup>∗</sup> representing undefinedness) in which <sup>t</sup>(i) = <sup>w</sup> if <sup>δ</sup>(v<sup>i</sup>, a) = {w}, and t(i) = <sup>∗</sup> if δ(v<sup>i</sup>, a) = <sup>∅</sup>.


Temporarily, t may contain duplicate or undefined entries, but Steps 2. and 3. ensure that in the end, t is a partial permutation of length at most q. Let r (for 'removed') denote the lowest index i such that t(i) = <sup>∗</sup> after Step 2. Let a (for 'active') denote the lowest index i such that (l(i), a, l (i)) <sup>∈</sup> F. If r > <sup>|</sup>l | and there is no i with (l(i), a, l (i)) <sup>∈</sup> F, then put α(g, a, h) = 1. Otherwise, put

$$\alpha(g, a, h) = \begin{cases} 2(q - r) + 3 & \text{if } r \le a \\ 2(q - a) + 2 & \text{if } r > a. \end{cases}$$

**Theorem 7.** *We have* L(A) = L(B)*, and* <sup>B</sup> *has at most* <sup>2</sup>n + 1 *priorities; for* n <sup>≥</sup> <sup>4</sup>*, we have* <sup>|</sup>W| ≤ n!e*.*

**Corollary 8.** *Limit-deterministic B¨uchi automata of size* n *can be determinized to deterministic parity automata of size* <sup>O</sup>(n!) *and with* <sup>O</sup>(n) *priorities.*

**Example 9.** Consider the limit-deterministic BA A depicted below and the determinized DPA B that is constructed from it by applying the method. We see by Lemma <sup>3</sup> that <sup>A</sup> is really limit-deterministic: we have F <sup>=</sup> {(1, b, 3)}, i.e. the b-transition from state 1 to state 3 (depicted with a boxed transition label) is the only accepting transition; thus we have <sup>Q</sup> <sup>=</sup> reach(π<sup>3</sup>[F]) = {1, <sup>3</sup>} (so Q <sup>=</sup> {0, <sup>2</sup>}), and the states 1 and 3 are deterministic. Moreover, L(A) = L(B) = a(a|b)<sup>+</sup>(a+b)<sup>ω</sup>.

Notice that in <sup>B</sup>, there is a b-transition with priority 1 from the initial state to the sink state (∅, [ ]) and an a-transition to ({0, <sup>2</sup>}, [1]); as 1 <sup>∈</sup> Q but 1 <sup>∈</sup>/ F, this transition has priority 1. A further b-transition leads from 1 to 3 in <sup>A</sup>; in <sup>B</sup>, we have a b-transition from ({0, <sup>2</sup>}, [1]) to ({2}, [3]) and since (1, b, 3) <sup>∈</sup> F, the first position in the permutation component is active during this transition so that the transition has priority 4. Yet another b-transition loops from ({2}, [3]) to ({2}, [3]). Since there is no b-transition starting at state 3, the first element in the permutation is removed in Step 1. of the construction. Since there is a b-transition from 2 to 3, it is added to the permutation again in Step 4. of the construction. Crucially, however, the priority of the transition is 5, since the first item of the permutation has been (temporarily) removed. The intuition is that the trace of 3 ends when the letter b is read; even though a new trace of 3 immediately starts, we do not consider it to be the same trace as the previous one. Thus the transition obtains priority 5 so that it may be used only finitely often in an accepting run of B, i.e. accepting runs contain an uninterrupted trace that visits state 3 infinitely often. Thus two or more consecutive b's can only occur finitely often in any accepted word.

#### **2.3 Determinizing Limit-Deterministic Parity Automata**

To determinize limit-deterministic PA, it suffices to transform them to equivalent limit-deterministic BA and determinize the BA. This transformation from PA to BA is achieved by a construction which is inspired by Theorems 2 and 3 in [14]; we add the observation that the construction preserves limit-determinism.

**Definition 10.** Given a limit-deterministic PA <sup>C</sup> = (V, Σ, δ, u0, α) with <sup>n</sup> <sup>=</sup> <sup>|</sup><sup>V</sup> <sup>|</sup> and k > 2 priorities, we define the limit-deterministic BA <sup>D</sup> = (W, Σ, δ , u0, F) by putting W <sup>=</sup> V <sup>∪</sup> (V × {0,..., <sup>k</sup>−<sup>1</sup> 2 }), and for w <sup>∈</sup> W and a <sup>∈</sup> Σ,

$$\delta'(v,a) = \begin{cases} \{(w,m) \mid (v,a,w) \in \alpha(2m)\} \cup \delta(v,a) & \text{if } v \in V\\ \{(w,l) \mid (v',a,w) \in \alpha\_{\le}(2l)\} & \text{if } v = (v',l) \notin V \end{cases}$$

Finally, we put F <sup>=</sup> {((v,l), a,(w,l)) <sup>∈</sup> δ <sup>|</sup> <sup>α</sup>(v, a, w)=2l}. To see that <sup>D</sup> is limit-deterministic, it suffices by Lemma <sup>3</sup> to show that reach(π<sup>3</sup>[F]) is deterministic. We observe that for each state (w,l) <sup>∈</sup> reach(π<sup>3</sup>[F]), (w,l) is deterministic by definition of δ since <sup>w</sup> is contained in a (by Lemma 3, internally deterministic) 2l-compartment of <sup>C</sup>.

**Lemma 11.** *We have* L(C) = L(D) *and* <sup>|</sup>W| ≤ n( k 2 + 1) <sup>≤</sup> nk*.*

By Theorem 7, <sup>D</sup> can be determinized to a DPA <sup>E</sup> of size at most (nk)!e, with at most nk + 2 priorities and with L(D) = L(E).

**Corollary 12.** *Limit-deterministic parity automata of size* n *with* k *priorities can be determinized to deterministic parity automata of size* <sup>O</sup>((nk)!) *and with* <sup>O</sup>(nk) *priorities.*

### **3 Permutation Games for the Aconjunctive** *µ***-Calculus**

#### **3.1 The** *µ***-Calculus**

We briefly recall the definition of the μ-calculus. We fix a set P of *propositions*, a set <sup>A</sup> of *actions*, and a set <sup>V</sup> of fixpoint variables. The set <sup>L</sup><sup>μ</sup> of <sup>μ</sup>-calculus formulas is the set of all formulas φ, ψ that can be constructed by the grammar

$$\psi, \phi ::= \bot \mid \top \mid p \mid \neg p \mid X \mid \psi \land \phi \mid \psi \lor \phi \mid \langle a \rangle \psi \mid [a] \psi \mid \mu X. \psi \mid \nu X. \psi$$

where p <sup>∈</sup> P, a <sup>∈</sup> A, and X <sup>∈</sup> <sup>V</sup>; we write <sup>|</sup>ψ<sup>|</sup> for the size of a formula ψ. Throughout the paper, we use η to denote one of the fixpoint operators μ or ν. We refer to formulas of the form ηX. ψ as *fixpoint literals*, to formulas of the form aψ or [a]ψ as *modal literals*, and to p, <sup>¬</sup>p as *propositional literals*. The operators μ and ν *bind* their variables, inducing a standard notion of *free variables* in formulas. We denote the set of free variables of a formula ψ by FV(ψ). A formula ψ is *closed* if FV(ψ) = <sup>∅</sup>, and *open* otherwise. We write ψ <sup>≤</sup> φ (ψ<φ) to indicate that ψ is a (proper) subformula of φ. We say that φ *occurs free* in ψ if φ occurs in ψ as a subformula that is not in the scope of any fixpoint operator. Throughout, we *restrict to formulas that are guarded*, i.e. have at least one modal operator between any occurrence of a variable X and an enclosing binder ηX. (This is standard although possibly not without loss of generality [10].) Moreover we assume w.l.o.g. that input formulas are *clean*, i.e. all fixpoint variables are mutually distinct and distinct from all free variables, and *irredundant*, i.e. X <sup>∈</sup> FV(ψ) for all subformulas ηX. ψ. We refer to a variable X that is bound by a least (greatest) fixpoint operator μX.χ (νX.χ) in a formula φ as a μ*-variable* (ν*-variable*) of φ, and to the process of substituting such an X with its binding fixpoint literal (μX.χ or νX.χ, respectively) as *unfolding*. An occurrence of a subformula ψ of a formula φ *contains an active* μ*-variable* [15] if ψ can be converted into a formula containing a free occurrence of a μ-variable of φ by repeatedly unfolding ν-variables of φ.

Formulas are evaluated over *Kripke structures* <sup>K</sup> = (W,(R<sup>a</sup>)<sup>a</sup>∈<sup>A</sup>, π), consisting of a set <sup>W</sup> of *states*, a family (R<sup>a</sup>)<sup>a</sup>∈<sup>A</sup> of relations <sup>R</sup><sup>a</sup> <sup>⊆</sup> <sup>W</sup> <sup>×</sup>W, and a valuation π : P → P(W) of the propositions. Given an *interpretation* i : <sup>V</sup> → P(W) of the fixpoint variables, define [[ψ]]<sup>i</sup> <sup>⊆</sup> <sup>W</sup> by the obvious clauses for Boolean operators and propositions, [[X]]<sup>i</sup> <sup>=</sup> <sup>i</sup>(X), [[aψ]]<sup>i</sup> <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>W</sup> | ∃<sup>w</sup> <sup>∈</sup> <sup>R</sup><sup>a</sup>(v).w <sup>∈</sup> [[ψ]]i}, [[[a]ψ]]<sup>i</sup> <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>W</sup> | ∀<sup>w</sup> <sup>∈</sup> <sup>R</sup><sup>a</sup>(v).w <sup>∈</sup> [[ψ]]i}, [[μX. ψ]]<sup>i</sup> <sup>=</sup> <sup>μ</sup>[[ψ]]<sup>X</sup> <sup>i</sup> and [[νX. ψ]]<sup>i</sup> <sup>=</sup> ν[[ψ]]<sup>X</sup> <sup>i</sup> , where <sup>R</sup><sup>a</sup>(v) = {<sup>w</sup> <sup>∈</sup> <sup>W</sup> <sup>|</sup> (v, w) <sup>∈</sup> <sup>R</sup><sup>a</sup>}, [[ψ]]<sup>X</sup> <sup>i</sup> (G) = [[ψ]]i[X→G], and <sup>μ</sup>, ν take least and greatest fixpoints of monotone functions, respectively. If ψ is closed, then [[ψ]]<sup>i</sup> does not depend on <sup>i</sup>, so we just write [[ψ]]. We denote the *Fischer-Ladner closure* [16] of a formula φ by **<sup>F</sup>**(φ), or just by **<sup>F</sup>**, if no confusion arises; intuitively, **F** is the set of formulas that can arise as subformulas when unfolding each fixpoint operator in φ at most once. We note **<sup>F</sup>** ≤ |φ<sup>|</sup> [16].

The *aconjunctive fragment* [15] of the μ-calculus is obtained by requiring that for all conjunctions that occur as a subformula, at most one of the conjuncts contains an active μ-variable. In the *weakly aconjunctive fragment* [24], this requirement is loosened to the constraint that all conjunctions that occur as a subformula and contain an active <sup>μ</sup>-variable are of the shape <sup>ψ</sup> <sup>∧</sup> ♦ψ<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> ♦ψ<sup>n</sup> <sup>∧</sup>(ψ<sup>1</sup>∨...∨ψ<sup>n</sup>), where <sup>ψ</sup> does not contain active <sup>μ</sup>-variables. For instance, for all <sup>n</sup>, the formula ηX<sup>n</sup> . . . μX<sup>1</sup>.νX<sup>0</sup>. <sup>0</sup>≤i≤<sup>n</sup>(q<sup>i</sup> <sup>∧</sup> ♦X<sup>i</sup>) is aconjunctive (and equivalent to the weakly aconjunctive formula obtained by replacing ♦X<sup>i</sup> with ♦X<sup>i</sup> <sup>∧</sup>♦ ∧ (X<sup>i</sup> ∨ )). The permutation satisfiability games that we introduce work for the more expressive weakly aconjunctive fragment.

We will make use of the standard *tableau rules* [10] (each consisting of one *premise* and a possibly empty set of *conclusions*):

$$(\top) \qquad \qquad \frac{\Gamma, \bot}{\neg} \qquad \qquad (\Downarrow) \qquad \qquad \frac{\Gamma, p, \neg p}{\neg} \qquad \qquad \qquad \qquad (\land) \qquad \qquad \frac{\Gamma, \psi \land \phi}{\Gamma, \psi, \phi}$$

$$\begin{array}{ccccccccc} & & \uparrow & & \Gamma, \psi, \phi\\ \langle \vee & & \frac{\Gamma, \psi \vee \phi}{\Gamma, \psi & \Gamma, \phi} & \langle \langle a \rangle \rangle & \frac{\Gamma, [a] \psi\_1, \dots, [a] \psi\_n, \langle a \rangle \phi}{\psi\_1, \dots, \psi\_n, \phi} & \langle \eta \rangle & \frac{\Gamma, \eta X. \psi}{\Gamma, \psi [X \mapsto \eta X. \psi]} \\\end{array}$$

(for a <sup>∈</sup> A, p <sup>∈</sup> P); we refer to the tableau rules by <sup>R</sup> and usually write rule applications with premise Γ and conclusion <sup>Σ</sup> = Γ<sup>1</sup>,..., <sup>Γ</sup><sup>n</sup> sequentially: (Γ/Σ).

To track fixpoint formulas through pre-tableaux, we will use deferrals, that is, the decomposed form of formulas that are obtained by unfolding fixpoint literals.

**Definition 13 (Deferrals).** Given fixpoint literals <sup>χ</sup><sup>i</sup> <sup>=</sup> ηX<sup>i</sup>. ψ<sup>i</sup>, <sup>i</sup> = 1,...,n, we say that a substitution <sup>σ</sup> = [X<sup>1</sup> → <sup>χ</sup><sup>1</sup>]; ... ; [X<sup>n</sup> → <sup>χ</sup><sup>n</sup>] *sequentially unfolds* <sup>χ</sup><sup>n</sup> if <sup>χ</sup><sup>i</sup> <sup>&</sup>lt;<sup>f</sup> <sup>χ</sup><sup>i</sup>+1 for all 1 <sup>≤</sup> i<n, where we write ψ <<sup>f</sup> ηX. φ if <sup>ψ</sup> <sup>≤</sup> <sup>φ</sup> and <sup>ψ</sup> is open and occurs free in <sup>φ</sup> (i.e. <sup>σ</sup> unfolds a nested sequence of fixpoints in <sup>χ</sup><sup>n</sup> innermost-first). We say that a formula χ is *irreducible* if for every substitution [X<sup>1</sup> → <sup>χ</sup><sup>1</sup>]; ... ; [X<sup>n</sup> → <sup>χ</sup><sup>n</sup>] that sequentially unfolds <sup>χ</sup><sup>n</sup>, we have that <sup>χ</sup> <sup>=</sup> <sup>χ</sup><sup>1</sup>([X<sup>2</sup> → <sup>χ</sup><sup>2</sup>]; ... ; [X<sup>n</sup> → <sup>χ</sup><sup>n</sup>]) implies <sup>n</sup> = 1 (i.e. <sup>χ</sup> <sup>=</sup> <sup>χ</sup><sup>1</sup>). A formula <sup>ψ</sup> *belongs* to an irreducible closed fixpoint literal <sup>θ</sup><sup>n</sup>, or is a θ<sup>n</sup>*-deferral*, if ψ <sup>=</sup> ασ for some substitution <sup>σ</sup> = [X<sup>1</sup> → <sup>θ</sup><sup>1</sup>]; ... ; [X<sup>n</sup> → <sup>θ</sup><sup>n</sup>] that sequentially unfolds <sup>θ</sup><sup>n</sup> and some α <<sup>f</sup> <sup>θ</sup><sup>1</sup>. We denote the set of <sup>θ</sup><sup>n</sup>-deferrals by dfr(θ<sup>n</sup>).

E.g. the substitution σ = [Y → μY.(X <sup>∧</sup> ♦♦Y )]; [X → θ] sequentially unfolds the irreducible closed formula θ <sup>=</sup> νX. μY.(X∧♦♦Y ), and (♦Y )σ <sup>=</sup> ♦μY.(θ<sup>∧</sup> ♦♦Y ) is a θ-deferral. A fixpoint literal is irreducible if it is not an unfolding ψ[X → ηX. ψ] of a fixpoint literal ηX. ψ; in particular, every clean irredundant fixpoint literal is irreducible.

As a technical tool, we define a measure for the depth of alternation at which a deferral resides inside the fixpoint to which it belongs:

**Definition 14 (Alternation level and alternation depth).** The *alternation level* al(φσ) := al(σ) of a deferral φσ is defined inductively over <sup>|</sup>σ|, where al() = al()<sup>μ</sup> <sup>=</sup> al()<sup>ν</sup> = 0, for the empty substitution , al(σ; [<sup>X</sup> → ηX. ψ]) = al(σ)<sup>μ</sup> + 1 if <sup>η</sup> <sup>=</sup> <sup>μ</sup> and al(σ; [<sup>X</sup> → ηX. ψ]) = al(σ)<sup>ν</sup> otherwise, and

$$\begin{aligned} \mathsf{al}(\sigma; [X \mapsto \eta X. \psi])\_{\mu} &= \begin{cases} \mathsf{al}(\sigma)\_{\mu} & \text{if } \eta = \mu \\ \mathsf{al}(\sigma)\_{\nu} + 1 & \text{otherwise} \end{cases} \\ \mathsf{al}(\sigma; [X \mapsto \eta X. \psi])\_{\nu} &= \begin{cases} \mathsf{al}(\sigma)\_{\nu} & \text{if } \eta = \nu \\ \mathsf{al}(\sigma)\_{\mu} + 1 & \text{otherwise} \end{cases} \end{aligned}$$

This definition assigns greater numbers to inner fixpoint literals, i.e. to deferrals which occur at higher nesting depth, i.e. with more alternation inside their sequence σ. Given a formula ψ, its *alternation depth* ad(φ) is defined as ad(φ) = max{al(δ) <sup>|</sup> δ <sup>∈</sup> **<sup>F</sup>**, <sup>∃</sup>θ.δ <sup>∈</sup> dfr(θ)}.

#### **3.2 Limit-Deterministic Tracking Automata**

As a first step towards deciding the satisfiability of a weakly aconjunctive μ-calculus formula φ, we now construct a tracking automaton that takes branches of (that is, infinite paths through) standard pre-tableaux for φ as input and accepts a branch if and only if it contains a least fixpoint formula whose satisfaction is deferred indefinitely on that branch. To this end, we import the following notions of threads and tableaux from [10]:

**Definition 15.** <sup>A</sup> *pre-tableau* for a formula φ is a graph the nodes of which are labelled with subsets of the Fischer-Ladner closure **<sup>F</sup>**; the graph structure L of a pre-tableau is constructed by applying tableau rules from R to the labels of nodes with the requirement that for each rule application (Γ/Σ) to the label Γ of a node v, there is a w with (v, w) <sup>∈</sup> L such that the label of w is contained in Σ. Nodes whose labels are *saturated* (i.e. do not contain propositional or fixpoint operators) are called *states*. Formulas are tracked through rule applications by the *connectedness relation* <sup>⊆</sup> (P(**F**) <sup>×</sup> **<sup>F</sup>**)<sup>2</sup> that is defined by putting Φ, φ <sup>Ψ</sup>, ψ if and only if Ψ is a conclusion of an application of a rule from <sup>R</sup> to Φ such that φ <sup>∈</sup> Φ, ψ <sup>∈</sup> Ψ, and the rule application transforms φ to ψ; if the rule application does not change <sup>φ</sup>, then <sup>φ</sup> <sup>=</sup> <sup>ψ</sup>. E.g. we have Φ, ψ<sup>1</sup> <sup>∧</sup> <sup>ψ</sup><sup>2</sup> <sup>Ψ</sup>, ψ<sup>i</sup>, where <sup>i</sup> ∈ {1, <sup>2</sup>} and Ψ is obtained from Φ by applying the rule (∧) to <sup>ψ</sup><sup>1</sup> <sup>∧</sup> <sup>ψ</sup><sup>2</sup>. <sup>A</sup> *branch* <sup>Ψ</sup><sup>0</sup>, <sup>Ψ</sup><sup>1</sup> ... in a pre-tableau is a sequence of labels such that for all i > 0, Ψi+1 is an <sup>L</sup>-successor of Ψi. A *thread* on an infinite branch Ψ<sup>0</sup>, <sup>Ψ</sup><sup>1</sup>,... is an infinite sequence <sup>t</sup> <sup>=</sup> <sup>ψ</sup>0, ψ<sup>1</sup> ... of formulas with Ψ<sup>0</sup>, ψ<sup>0</sup> <sup>Ψ</sup><sup>1</sup>, ψ<sup>1</sup> .... A μ*-thread* is a thread t such that min(Inf(al◦t)) is odd, i.e. the outermost fixpoint literal that is unfolded infinitely often in t is a least fixpoint literal. A *bad branch* is an infinite branch that contains a μ-thread. A *tableau* for φ is a pre-tableau for φ that does not contain bad branches.

We import from [10] the well-known fact that the existence of tableaux in the sense defined above characterizes satisfiability. In [10], the result is shown for the more general *unguarded* μ-calculus; we note that the restriction to guarded formulas does not invalidate the theorem.

**Theorem 16 (**[10]**).** *<sup>A</sup>* μ*-calculus formula* ψ *is satisfiable if and only if there is a tableau for* ψ*.*

Given a formula <sup>φ</sup>, we define the alphabet <sup>Σ</sup><sup>φ</sup> to consist of letters that each identify a rule R ∈ R, a principal formula from **<sup>F</sup>** and one of the conclusions of R. E.g. the letter ((∨), <sup>0</sup>, p <sup>∨</sup> ♦q) identifies the application of the disjunction rule to a principal formula p <sup>∨</sup> ♦q and the choice of the left conclusion; thus this letter identifies the transition from p <sup>∨</sup> ♦q to p by use of rule (∨). We note <sup>|</sup>Σ<sup>φ</sup>|∈O(|φ|). Further, we denote the set of all words that encode some branch and some bad branch in some pre-tableau for φ by Branch(φ) and BadBranch(φ), respectively.

As a crucial result, we now show that limit-deterministic automata are expressive enough to exactly recognize the bad branches in pre-tableaux for weakly aconjunctive formulas.

**Lemma 17.** *Let* φ *be a weakly aconjunctive formula. Then there is a* limitdeterministic *PA* <sup>A</sup> = (V,Σφ, δ, φ, α) *with* <sup>|</sup><sup>V</sup> |≤|φ<sup>|</sup> *and* idx(A) <sup>≤</sup> ad(φ)+1 *such that* L(A) <sup>∩</sup> Branch(φ) = BadBranch(φ)*.*

*Proof (Sketch).* The automaton nondeterministically guesses formulas to be tracked, one at a time; the set of states of the automaton is the Fischer-Ladner closure of φ. The priorities of the transitions in the automaton are derived from the alternation level of the target formula of the respective transition; then every word w <sup>∈</sup> L(A) that encodes some branch encodes a bad branch. Once a deferral is tracked, weak aconjunctivity implies that all compartments to which the tracked formula belongs are internally deterministic; this is the case since for conjunctions <sup>ψ</sup> <sup>=</sup> <sup>ψ</sup><sup>0</sup> <sup>∧</sup> ♦ψ<sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> ♦ψ<sup>n</sup> <sup>∧</sup> (ψ<sup>1</sup> <sup>∨</sup> ... <sup>∨</sup> <sup>ψ</sup><sup>n</sup>) – the only case that can introduce nondeterminism – each next modal step determines just one of the formulas <sup>ψ</sup><sup>i</sup> that has to be tracked; the conjunct <sup>ψ</sup><sup>0</sup> does not contain active μ-variables, so tracking it causes the automaton to leave all compartments to which ψ belongs. Thus the automaton is limit-deterministic.

**Example 18.** We consider the aconjunctive formula

$$\phi = \mu X.(\!(p \land \nu Y.\ (\diamondsuit (Y \land p) \lor \diamondsuit X)))$$

which expresses the existence of a finite or infinite path on which p holds everywhere. We have the <sup>φ</sup>-deferrals φ, <sup>ψ</sup> := (<sup>p</sup> <sup>∧</sup> νY. (♦(<sup>Y</sup> <sup>∧</sup> <sup>p</sup>) <sup>∨</sup> ♦X))σ<sup>1</sup>, θ := (νY. (♦(Y <sup>∧</sup>p)∨♦X))σ<sup>1</sup>, <sup>χ</sup> := (♦(<sup>Y</sup> <sup>∧</sup>p)∨♦X)σ<sup>2</sup>, (♦(<sup>Y</sup> <sup>∧</sup>p))σ<sup>2</sup>, <sup>τ</sup> := (<sup>Y</sup> <sup>∧</sup>p)σ<sup>2</sup>, Y σ<sup>2</sup>, ♦Xσ<sup>2</sup> and Xσ<sup>2</sup>, where <sup>σ</sup><sup>1</sup> = [<sup>X</sup> → <sup>φ</sup>] and <sup>σ</sup><sup>2</sup> = [<sup>Y</sup> → <sup>ψ</sup>]; <sup>σ</sup><sup>1</sup>. We consider a pre-tableau <sup>P</sup><sup>φ</sup> for <sup>φ</sup> and like in the proof of Lemma 17, we construct the limit-deterministic tracking automaton Aφ, depicted below:

Pφ:

The priorities in <sup>A</sup><sup>φ</sup> are derived as follows: As ad(φ) = 2 is even, we put <sup>k</sup> <sup>=</sup> ad(φ) + 1 = 3; since al(φ) = al(ψ) = 1, α(φ,(μ), ψ) = α(♦φ,(♦), φ) = k <sup>−</sup>al(φ) = 2 and since al(p) = 0, α(ψ,(∧), p) = α(ς,(∧), p) = k <sup>−</sup> al(φ) = 3. All other formulas have alternation level 2 and transitions to them obtain priority 1. The tracking automaton accepts exactly those branches in <sup>P</sup><sup>φ</sup> that start at node **<sup>1</sup>** and take the loop through node **<sup>9</sup>** infinitely often; in these branches, φ can be tracked forever and evolves to φ infinitely often, i.e. their dominating formula is the least fixpoint formula φ. All other branches loop through node **<sup>7</sup>** without passing node **<sup>9</sup>** from some point on; their dominating fixpoint formula is θ, a greatest fixpoint formula. We observe that due to the aconjunctivity of <sup>φ</sup>, <sup>A</sup><sup>φ</sup> is limit-deterministic since the only two nondeterministic states ψ and ς each have only one outgoing (∧)-transition with priority less than k = 3.

Given a weakly aconjunctive formula φ, we use Lemma <sup>17</sup> to construct a limitdeterministic tracking automaton <sup>A</sup><sup>φ</sup> with <sup>L</sup>(Aφ) <sup>∩</sup> Branch(φ) = BadBranch(φ). Then we put Lemma 11 to use to obtain an equivalent BA in which all states from Q <sup>=</sup> reach(π<sup>3</sup>[F]) are *levelled deferrals*, i.e. pairs (ψ, q) consisting of a deferral <sup>ψ</sup> and a number q ≤ <sup>k</sup> <sup>2</sup> , the *level* of the pair (ψ, q); the level <sup>q</sup> encodes the odd alternation level 2q <sup>−</sup> 1. A levelled deferral (ψ, q) is *active* if al(ψ)=2q <sup>−</sup> 1 and the automaton accepts branches which contain a levelled deferral that is active infinitely often without being finished. The set <sup>Q</sup> is just a subset of **<sup>F</sup>**. Next we use Theorem <sup>7</sup> to transform this BA to a DPA <sup>B</sup><sup>φ</sup> with <sup>L</sup>(Aφ) = <sup>L</sup>(Bφ). We complement <sup>B</sup><sup>φ</sup> to a DPA <sup>C</sup><sup>φ</sup> = (W, Σ<sup>φ</sup>, δ, φ, α) by decreasing the priority of each state in <sup>B</sup><sup>φ</sup> by one; we have <sup>L</sup>(Cφ) = <sup>L</sup>(Bφ), that is, <sup>C</sup><sup>φ</sup> accepts exactly those words that encode only 'good' branches, if they encode some branch in some pre-tableau for <sup>φ</sup>. By construction, <sup>|</sup>W|∈O((nk)!) and <sup>C</sup><sup>φ</sup> has at most nk + 1 priorities, and (recalling Definitions <sup>6</sup> and 10) the states in the carrier <sup>W</sup> of <sup>C</sup><sup>φ</sup> are of the shape (U, l), where U is a subset of **<sup>F</sup>** and l is a partial permutation of levelled deferrals. For a transition t = ((U, l), r,(V,l )) with (U, l),(V,l ) <sup>∈</sup> W, r <sup>∈</sup> Σ<sup>φ</sup>, if <sup>α</sup>(t) = 2(n−a)+1, then <sup>a</sup> is the lowest number such that al(φ)=2q−1, where l (a)=(φ, q) and the a-th element of l is not removed by the transition t (i.e. α(t) references the oldest levelled deferral in l that is active but not removed by the transition t) and if α(t) = 2(n−r)+ 2, then α(t) is the index of the oldest levelled deferral (φ, <sup>2</sup>q <sup>−</sup>1) that is finished (i.e. removed from l) in the transition <sup>t</sup> of the automaton <sup>C</sup>φ, which means that the according <sup>r</sup>-transition in <sup>A</sup><sup>φ</sup> makes φ leave its 2q <sup>−</sup> 1-compartment. For a state v = (U, l), we define the *label* Γ(v) of v as Γ(v) = U.

#### **3.3 Permutation Games**

The deterministic parity automaton C<sup>φ</sup> can now be combined with applications of tableau rules from <sup>R</sup> to form a satisfiability game for φ. We proceed to recall the definition of parity games and some ensuing basic notions. A *parity game* is a graph <sup>G</sup> = (V,E,α) that consists of a set of nodes V , a set of edges E <sup>⊆</sup> V <sup>×</sup>V and a priority function α : E <sup>→</sup> <sup>N</sup>, assigning priorities to *edges*. We assume <sup>V</sup> <sup>=</sup> <sup>V</sup><sup>∃</sup> ∪· <sup>V</sup><sup>∀</sup>, that is, every node in <sup>V</sup> either belongs to player Eloise (V<sup>∃</sup>) or to player Abelard (V<sup>∀</sup>). A *play* <sup>ρ</sup> of <sup>G</sup> is a (possibly infinite) sequence <sup>v</sup><sup>0</sup>v<sup>1</sup> ... such that for all <sup>i</sup> <sup>≥</sup> 0, <sup>v</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> and (v<sup>i</sup>, v<sup>i</sup>+1) <sup>∈</sup> <sup>E</sup>. A play <sup>ρ</sup> of <sup>G</sup> is won by Eloise if and only if ρ is finite and ends in a node that belongs to Abelard or ρ is infinite and max(Inf(α ◦ trans(ρ))) is even (where trans(ρ) is defined by trans(ρ)(i)=(ρ(i), ρ(i + 1))); Abelard wins a play ρ if and only if Eloise does not win ρ. A (memoryless) strategy s : V → V assigns moves to states. A play ρ *conforms* to a strategy s if for all ρ(i) <sup>∈</sup> dom(s), ρ(i + 1) = s(ρ(i)). Eloise has a winning strategy for a node <sup>v</sup> if there is a strategy <sup>s</sup> : <sup>V</sup><sup>∃</sup> <sup>→</sup> <sup>V</sup> such that every play of <sup>G</sup> that starts at v and conforms to s is won by Eloise; we have a dual notion of winning strategies for Abelard. The winning regions win∃(G) and win∀(G) are the sets of those nodes for which Eloise and Abelard have winning strategies, respectively. *Solving* a parity game G (locally) for a particular node v <sup>∈</sup> V amounts to computing the winner of v.

Now we are ready to define permutation games for weakly aconjunctive formulas <sup>φ</sup>, using the DPA <sup>C</sup><sup>φ</sup> = (W, Σφ, δ, φ, α) from the previous section.

**Definition 19 (Permutation games).** Let φ be a weakly aconjunctive formula. We define the *permutation game* <sup>G</sup>(φ)=(W, E, β) to be a parity game that has the carrier of <sup>C</sup><sup>φ</sup> as set of nodes. For every node <sup>v</sup> <sup>∈</sup> <sup>W</sup> for which Γ(v) is not a state, we fix a single rule that is to be applied to Γ(v) and a single principal formula <sup>ψ</sup><sup>v</sup> <sup>∈</sup> Γ(v) to which the rule is to be applied. If (∨) is to be applied to Γ(v), then we put v <sup>∈</sup> W<sup>∃</sup>; otherwise, <sup>v</sup> <sup>∈</sup> <sup>W</sup><sup>∀</sup>. In particular, all state nodes are contained in W<sup>∀</sup>. For <sup>v</sup> <sup>∈</sup> <sup>W</sup>, we put <sup>E</sup>(v) = -{δ(v, a) <sup>|</sup> a <sup>∈</sup> Σ<sup>v</sup>}, where <sup>Σ</sup><sup>v</sup> <sup>⊆</sup> <sup>Σ</sup><sup>φ</sup> consists of all letters <sup>a</sup> that encode the application of some rule to Γ(v) with the condition that the principal formula of the rule application must be <sup>ψ</sup><sup>v</sup> if <sup>v</sup> is not a state node. Finally, we put <sup>β</sup>(v, w) = <sup>α</sup>(v, a, w) for (v, w) <sup>∈</sup> <sup>E</sup>, where <sup>a</sup> <sup>∈</sup> <sup>Σ</sup><sup>v</sup> encodes the rule application that leads from <sup>v</sup> to <sup>w</sup>.

**Theorem 20.** *Let* φ *be a closed, irreducible and weakly aconjunctive formula. Then we have* ({φ}, [ ]) <sup>∈</sup> win∃(G(φ)) *if and only if* φ *is satisfiable.*

*Proof.* By construction, Eloise wins ({φ}, [ ]) if and only if there is a tableau for φ (labelled by the labelling function Γ); we are done by Theorem 16.

Due to the relatively simple structure and the asymptotically smaller size of the determinized automata Cφ, the resulting permutation games are somewhat easier to construct and can be solved asymptotically faster than the structures created by standard satisfiability decision procedures for the full μ-calculus (e.g. [5,10]) which employ the full Safra/Piterman-construction; note however, that our method is restricted to the weakly aconjunctive fragment.

**Corollary 21.** *The satisfiability of weakly aconjunctive* μ*-calculus formulas can be decided by solving parity games of size* <sup>O</sup>((nk)!) *and* <sup>O</sup>(nk) *priorities.*

The winning strategies for Eloise or Abelard in these games define models for or refutations of the respective formulas, so that we have

**Corollary 22.** *Satisfiable weakly aconjunctive* μ*-calculus formulas have models of size* <sup>O</sup>((nk)!)*.*

### **4 Implementation and Benchmarking**

We have implemented the permutation satisfiability games as an extension of the *Coalgebraic Ontology Logic Reasoner* (COOL) [11], a generic reasoner for coalgebraic modal logics<sup>1</sup>. COOL achieves its genericity by instantiating an abstract reasoner that works for all coalgebraic logics to concrete instances of logics.

<sup>1</sup> Available at https://www8.cs.fau.de/research:software:cool.

To incorporate support for the aconjunctive coalgebraic μ-calculus, we have extended the global caching algorithm that forms the core of COOL to generate and solve the corresponding permutation games, with optional *on-the-fly* solving; games are solved using either our own implementation of the fixpoint iteration algorithm for parity games (as in [1]) or PGSolver [8], which supports a range of game solving algorithms. Instance logics implemented in COOL currently include linear-time, relational, monotone, and alternating-time logics, as well as any logics that arise as combinations thereof. In particular, this makes COOL, to our knowledge, the only implemented reasoner for the aconjunctive fragments of the alternating-time μ-calculus and Parikh's game logic.

Although our tool supports the aconjunctive coalgebraic μ-calculus, we concentrate on the standard relational aconjunctive μ-calculus for experiments, as this allows us to compare our implementation with the reasoner MLSolver [9], which constructs satisfiability games using the Safra/Piterman-construction and hence supports the full relational μ-calculus; MLSolver uses PGSolver for game solving.

To test the implementations, we devise two series of hard aconjunctive formulas with deep alternating nesting of fixpoints. The following formulas encode that each reachable state in a Kripke structure has one of n priorities (encoded by atoms <sup>q</sup><sup>i</sup> for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>) and belongs to either Eloise (q<sup>e</sup>) or Abelard (q<sup>a</sup>):

$$\phi\_{\mathsf{aut}}(n) = \mathsf{AG}(\bigvee\_{1 \le i \le n} (q\_i \land \bigwedge\_{j \ne i} \neg q\_j)) \quad \phi\_{\mathsf{game}}(n) = \phi\_{\mathsf{aut}}(n) \land \mathsf{AG}((q\_{\mathsf{e}} \land \neg q\_{\mathsf{a}}) \lor (\neg q\_{\mathsf{e}} \land q\_{\mathsf{a}})) $$

Here we use AG ψ to abbreviate νX.(ψ <sup>∧</sup> X). Then the non-emptiness regions in parity automata and Eloise's winning region in parity games can be specified by the following *aconjunctive* formulas (where ♥∈{♦, }):

$$\begin{aligned} \phi\_{\mathsf{ne}}(n) &= \eta X\_n \dots \iota \nu X\_2 \, \mu X\_1 \, \psi\_{\lozenge} & \quad \psi\_{\hezenge} = \bigvee\_{1 \le i \le n} (q\_i \wedge \lozenge X\_i) \\ \phi\_{\mathsf{win}}(n) &= \eta X\_n \dots \iota \nu X\_2 \, \mu X\_1 \, \phi\_{\mathsf{star}\mathsf{t}}(\psi\_{\hezenge}) & \quad \phi\_{\mathsf{star}\mathsf{t}}(\psi\_{\hezenge}) = (q\_e \wedge \psi\_{\lozenge}) \vee (q\_a \wedge \psi\_{\hezenge}) \end{aligned}$$

Furthermore, we define (for ♥∈{♦, })

$$\theta\_{\bigcirc}(i) = (q\_i \land \bigcirc Y) \lor \bigvee\_{i < j \le n} (q\_j \land \bigcirc X) \lor \bigvee\_{1 \le j \le i} (q\_j \land \bigcirc Z))$$

The following series of valid formulas states that parity automata with n priorities can be transformed to nondeterministic parity automata with three priorities without affecting the non-emptiness region:

$$\theta\_1(n) := \phi\_{\mathsf{aut}}(n) \to (\phi\_{\mathsf{ne}}(n) \leftrightarrow \bigvee\_{i \text{ even}} \mu X.\nu Y.\mu Z.\theta\_{\diamondsuit}(i))$$

Similarly, if Eloise wins a parity game with n priorities, then she can ensure that in each play, each odd priority 1 <sup>≤</sup> i <sup>≤</sup> n is visited only finitely often, unless a priority greater than i is visited infinitely often (the converse does not hold in general [4]):

$$\theta\_2(n) := \phi\_{\texttt{game}}(n) \to (\phi\_{\texttt{win}}(n) \to \bigwedge\_{i \text{ odd}} \nu X.\mu Y.\nu Z.\ \phi\_{\texttt{stat}}(\theta\_{\heartsuit}(i))\ )$$

**Fig. 1.** Times for ¬θ1(n) (unsatisfiable) **Fig. 2.** Times for ¬θ2(n) (unsatisfiable)

Additionally, we devise two series of unsatisfiable formulas that exhibit the advantages of COOL's global caching and on-the-fly-solving capabilities. These formulas are inspired by the CTL-formula series early(n, j, k) and earlygc(n, j, k) from [13] but contain fixpoint-alternation of depth 2<sup>k</sup> inside the subformula θ:

$$\begin{aligned} \texttt{early-ac}(n,j,k) &= \texttt{start}\_{p} \wedge \texttt{init}(p,n) \wedge \texttt{init}(r,k) \wedge \texttt{AG}\left((r \to \texttt{c}(r,k)) \wedge (p \to \texttt{c}(p,n))\right) \wedge \\ &\quad \texttt{AG}\left((\bigwedge\_{0 \le i \le j} p\_{i} \to \Diamond(\texttt{start}\_{r} \wedge \theta)) \wedge \neg(p \wedge r) \wedge (r \to \sqsubset r)\right) \\ \texttt{early-ac}\_{\texttt{yc}}(n,j,k) &= \texttt{early-ac}(n,j,k) \wedge b \wedge \texttt{init}(q,n) \wedge \mathsf{AG}\left(\neg(p \wedge q) \wedge \neg(q \wedge r)\right) \wedge \\ &\quad \texttt{AG}\left((q \to c(q,n)) \wedge \mathsf{AF}\ b \wedge (b \to (\Diamond p \wedge \Diamond \text{ start}\_{q} \wedge \sqsubset b))\right) \\ \texttt{init}(x,m) &= \mathsf{AG}\left((\mathit{start}\_{x} \to (x \wedge \bigwedge\_{0 \le i < m} \neg x)) \wedge (x \to \Diamond x)\right) \\ &\quad \theta = \eta X\_{(2^{k})} \wedge \ldots \nu X\_{2} \,\mu X\_{1} \,\bigvee\_{1 \le i \le 2^{k}} (\mathsf{bin}(r,i-1) \wedge \lozenge X\_{i}),\end{aligned}$$

where <sup>c</sup>(x, m) encodes an <sup>m</sup>-bit counter using atoms <sup>x</sup><sup>0</sup>,...,x<sup>m</sup>−<sup>1</sup> and bin(r, i) denotes the binary encoding of the number i using atoms r<sup>0</sup>,...,r<sup>k</sup>−<sup>1</sup>. The formulas early-ac(n, j, k) specify a loop p of length 2<sup>n</sup> that branches after <sup>j</sup> steps to a second loop <sup>r</sup> of length 2<sup>k</sup> on which the highest value of the counter (which counts from 0 to 2<sup>k</sup> <sup>−</sup>1 and then restarts at 0) is required to be an even number. For constant k, the contradiction on loop r yields a small refutation which can be found early, using on-the-fly solving. The formulas early-acgc(n, j, k) extend this specification by stating that a third loop q of length 2<sup>n</sup> is started from loop p infinitely often. Procedures with sufficient caching capabilities will have to (partially) explore this loop at most once.

We compare the runtimes of MLSolver and COOL on the formulas described above; we let COOL and MLSolver solve games using the local strategy improvement algorithm stratimprloc2 provided by PGSolver. To solve games *on-the-fly* with COOL however, we use our own implementation of the fixpoint iteration

**Fig. 3.** early-ac(n, 4, 2) (unsatisfiable) **Fig. 4.** early-acgc(n, 4, 2) (unsatisfiable)

algorithm, which in general is slower than PGSolver but has the advantage that it enables on-the-fly solving. With this option enabled, COOL constructs and solves the satisfiability games step by step and finishes as soon as one of the players has a winning strategy in the partial game. For COOL, we have conducted all experiments with and without on-the-fly solving. For MLSolver, we also enabled the optimizations -opt litpro and -opt comp (and refer to the resulting prover configuration as MLSolverOpt). Tests have been run on a system with Intel Core i7 3.60 GHz CPU with 16 GB RAM. A more detailed description of the results of the experiments as well as binaries of a formula generator, the prover COOL and scripts that benchmark the various configurations of the provers are available in a figshare repository at [12].

We observe that COOL without on-the-fly solving generally finishes faster than both MLSolver and MLSolverOpt throughout all tested series of formulas (see Figs. 1–4); the reason for this appears to be that the permutation games solved by COOL are of size <sup>O</sup>((nk)!), where n <sup>≤</sup> k, and hence asymptotically smaller than the Safra/Piterman games solved by MLSolver which are of size <sup>O</sup>(((nk)!)<sup>2</sup>). The size of the refutations for the formulas θ<sup>1</sup>(n) and <sup>θ</sup><sup>2</sup>(n) is exponential in <sup>n</sup> so that on-the-fly solving does in fact *increase* the runtimes of COOL (see Figs. 1 and 2); basically, these formulas cannot be decided early, and therefore any (necessarily unsuccessful) attempt to do so just consumes additional computation time. The formulas early-ac(n, <sup>4</sup>, 2) and early-acgc(n, <sup>4</sup>, 2), on the other hand, have refutations of size polynomial in n, and COOL appears to benefit from on-the-fly solving for these formulas as it is able to decide them early (see Figs. 3 and 4). As mentioned above, COOL uses our own unoptimized implementation of the fixpoint iteration algorithm [1] for on-the-fly solving; while this implementation is slower than PGSolver's stratimprloc2 algorithm, the on-the-fly abilities of COOL seem to compensate this disadvantage for the early-ac(n, <sup>4</sup>, 2) and early-acgc(n, <sup>4</sup>, 2) formulas from n = 11 and n = 8 on, respectively.

### **5 Conclusion**

We have presented a method to obtain satisfiability games for the *weakly aconjunctive* μ-calculus. The game construction uses determinization of *limit-deterministic* parity automata, avoiding the full complexity of the Safra/Piterman construction a) in the presentation of the procedure and its correctness proof and b) in the size of the obtained DPA (which comes from <sup>O</sup>((nk)!<sup>2</sup>) to <sup>O</sup>((nk)!)). The resulting permutation satisfiability games for the weakly aconjunctive μ-calculus are of size <sup>O</sup>((nk)!), have <sup>O</sup>(nk) priorities, and yield a new bound of <sup>O</sup>((nk)!) on the model size for this fragment. We have implemented this decision procedure in coalgebraic generality and with support for on-the-fly solving as part of the coalgebraic satisfiability solver COOL; initial experiments show favourable results.

The datasets generated and analyzed during the current study are available in the figshare repository: https://doi.org/10.6084/m9.figshare.5919451.v1.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symmetry Reduction for the Local Mu-Calculus**

Kedar S. Namjoshi1(B) and Richard J. Trefler2(B)

<sup>1</sup> Bell Labs, Nokia, Murray Hill, USA kedar.namjoshi@nokia-bell-labs.com <sup>2</sup> University of Waterloo, Waterloo, Canada trefler@uwaterloo.ca

**Abstract.** Model checking large networks of processes is challenging due to state explosion. In many cases, individual processes are isomorphic, but there is insufficient global symmetry to simplify model checking. This work considers the verification of local properties, those defined over the neighborhood of a process. Considerably generalizing earlier results on invariance, it is shown that all local mu-calculus properties, including safety and liveness properties, are preserved by neighborhood symmetries. Hence, it suffices to check them locally over a set of representative process neighborhoods. In general, local verification approximates verification over the global state space; however, if process interactions are outward-facing, the relationship is shown to be exact. For many network topologies, even those with little global symmetry, analysis with representatives provides a significant, even exponential, reduction in the cost of verification. Moreover, it is shown that for network families generated from building-block patterns, neighborhood symmetries are easily determined, and verification over the entire family reduces to verification over a finite set of representative process neighborhoods.

### **1 Introduction**

Networks of communicating processes are a model for distributed systems, cloud computing environments, routing protocols, many-core hardware processors, and other such systems. Often, networks are described parametrically, that is, a process template is instantiated at each node of a network graph. The expectation then is that basic correctness properties should hold regardless of the size and the shape of the network.

Model checkers can determine, fully automatically, whether a fixed instance of a process network satisfies a correctness property. However, model checking suffers from exponential state explosion as the size of the analyzed network increases. Thus, one may aim for parameteric analysis of a network family, "in one fell swoop"; however, the parametric model checking problem (PMCP) is undecidable in general [2]. Limiting to *compositional* proofs makes parametrized verification more tractable; as shown in [20], the PCMCP (Parameterized Compositional Model Checking problem) can be solved efficiently for standard network families (rings, tori, wrap-around mesh, etc.) where the PMCP is undecidable even for invariance properties.

In this work, we generalize these results considerably, from invariance to mucalculus properties. We formulate a local version of the mu-calculus to describe behaviors of a single process and its immediate neighborhood. The logic allows specification of safety and liveness properties, each property being limited to assertions over a fixed process neighborhood – e.g., "A hungry philosopher eventually acquires all adjacent forks". The goal of this work is a method to prove such properties for all processes in a network and, moreover, to prove properties parametrically, i.e., for all networks in a family.

Our analysis is based on a grouping of processes by local symmetry, where "balanced" processes have (recursively) similar neighborhoods [17,18,20]. Such symmetries are common in parametric network structures, for example [18,19], *c.f.* [17,20]. We establish that the local state spaces of balanced processes are sufficiently bisimilar that they satisfy the same local mu-calculus properties. It is, therefore, enough to model-check a representative process from each balance class, while paying particular attention to 'interference' transitions from neighboring processes.

We show that any *universal* local mu-calculus property established locally also holds on the global state space. Thus, a universal property can be established globally for all processes by checking it on the local state spaces of a few representatives.

Many communication protocols are designed in such a way that a typical process must offer a given set of input/output services to its communication environment, irrespective of its internal state. We show that under such outwardfacing interactions, the correspondence is exact: a local mu-calculus property holds globally if, and only if, it holds locally.

We also detail the implications for entire families of networks that are defined by 'symmetry patterns.' For instance, a network family with a transitive global symmetry group can be analyzed by examining a single representative node. Such dramatic reductions in complexity are generally not possible for non-local properties.

None of the symmetry reduction results rely in any essential manner on the processes being finite-state. To summarize the main results:


We also explore the implications of these results and, in particular, show that in several settings, local symmetries can be determined easily from process syntax. We show that for isomorphic 'normal' processes operating in a network whose communication graph has at least transitive symmetry, a balance relation with a single representative process can be generated from the syntactic description of the network. In another direction, we show that for networks formed from 'building block' patterns, the pattern instances serve as balance representatives. These direct, syntactic constructions avoid having to build global symmetry reduced structures, can lead to exponential reductions in the cost of model checking, and apply to many networks where global symmetry reduction techniques are ineffective. Moreover, entire network families can be model-checked via the analysis of a small number of representative processes, so that the savings in the cost of analysis are unbounded.

### **2 Preliminaries**

**Processes and Networks: Syntax.** A *network* is a directed graph, defined by a set of *nodes*, N, a set of *edges*, E, and two connection relations: *Out* <sup>⊆</sup> N <sup>×</sup> E and *In* <sup>⊆</sup> N <sup>×</sup> E. Connections are directed from node n to the edges in *Out*(n), and directed inwards from the edges in *In*(n) to n. Nodes m and n are neighbors, denoted *nbr* (n, m), if they have a common connected edge. Node m *points to* node n if there is an edge e in *Out*(m) <sup>∩</sup> *In*(n).

<sup>A</sup> *process* is defined by a tuple (V, I,T), where V is a set of variables which defines its local state space; I(V ) is a Boolean predicate defining the initial states; and T(V,V ) is a Boolean predicate defining the state transitions, using a copy V to denote the next state. Variables are partitioned into *internal* and *external* variables. External variables are labeled as *read*, or *write*, or both. The transition relation is required to preserve the value of read-only variables and its enabledness cannot depend on the values of write-only variables.

<sup>A</sup> *process network* P is defined by a network graph, a set of processes, and an assignment, ξ. Every node n is assigned a process ξ(n), which we denote for convenience by <sup>P</sup><sup>n</sup> = (V<sup>n</sup>, I<sup>n</sup>, T<sup>n</sup>). Each edge <sup>e</sup> is assigned a variable <sup>ξ</sup>(e) in V = (- n : V<sup>n</sup>). The assignment <sup>ξ</sup> must assign *In*(n) to the read variables in <sup>V</sup><sup>n</sup>, *Out*(n) to the write variables of <sup>V</sup><sup>n</sup>, and the internal variables of <sup>V</sup><sup>n</sup> to no network edge. The *shared* variables of processes <sup>P</sup><sup>m</sup> and <sup>P</sup><sup>n</sup> are those assigned to common connected edges of m and n.

**Processes and Networks: Semantics.** Semantically, the behavior of a process network P is defined as the process P = (I,V,T), where V = (- <sup>n</sup> : <sup>V</sup><sup>n</sup>), I <sup>=</sup> ( <sup>n</sup> : <sup>I</sup><sup>n</sup>), and <sup>T</sup> = ( <sup>n</sup> : <sup>T</sup><sup>n</sup> <sup>∧</sup> unchanged(<sup>V</sup> \V<sup>n</sup>)). This defines an interleaving semantics, with unchanged(W) denoting that the values of variables in W are unchanged.

<sup>A</sup> *global* state is a function mapping variables in V to values in their domain. <sup>A</sup> *local* state of <sup>P</sup><sup>n</sup> is a function mapping the variables in <sup>V</sup><sup>n</sup> to values in their domain. An *internal* state of <sup>P</sup><sup>n</sup> is a function mapping the internal variables of <sup>P</sup><sup>n</sup> to values in their domains.

For neighbors m, n, a *joint state* is a pair <sup>x</sup> = (xm, xn), where <sup>x</sup><sup>m</sup> and <sup>x</sup><sup>n</sup> are local states of processes <sup>P</sup><sup>m</sup> and <sup>P</sup>n, respectively, such that <sup>x</sup><sup>m</sup> and <sup>x</sup><sup>n</sup> have the same value for all shared variables. The transition relation <sup>T</sup><sup>n</sup> is extended to joint states as Tn(x, y), which holds iff <sup>T</sup>n(xn, yn) holds and the values of variables in <sup>P</sup><sup>m</sup> that are not shared with <sup>P</sup><sup>n</sup> are unchanged.

**Invariants: Global and Compositional.** Invariance is central to reasoning about dynamic system behavior. For a process network P as defined above, a *global assertion*, θ, is a set of global states of P. It is an *inductive invariant* for P if all initial states are in θ, i.e., [I(x) <sup>→</sup> θ(x)], and θ is closed under transitions, i.e., [θ(x) <sup>∧</sup> T(x, y) <sup>→</sup> θ(y)].<sup>1</sup>

In place of a single invariance assertion, compositional reasoning postulates a set of *local assertions*, {θ<sup>n</sup>}, where <sup>θ</sup><sup>n</sup> is a set of local states of <sup>P</sup><sup>n</sup>, for each <sup>n</sup>. This set is a *compositional inductive invariant* if, for all n:

**(Init)** The initial states of <sup>P</sup><sup>n</sup> are included in <sup>θ</sup><sup>n</sup>. That is, [I<sup>n</sup>(x<sup>n</sup>) <sup>→</sup> <sup>θ</sup><sup>n</sup>(x<sup>n</sup>)] **(Step)** Transitions of <sup>P</sup><sup>n</sup> preserve <sup>θ</sup><sup>n</sup>. That is, [θ<sup>n</sup>(x<sup>n</sup>) <sup>∧</sup> <sup>T</sup><sup>n</sup>(x<sup>n</sup>, y<sup>n</sup>) <sup>→</sup> <sup>θ</sup><sup>n</sup>(y<sup>n</sup>)]

**(Non-Interference)** Assertion <sup>θ</sup><sup>n</sup> is preserved by transitions of neighbors <sup>P</sup><sup>m</sup>, from every joint state satisfying both <sup>θ</sup><sup>m</sup> and <sup>θ</sup><sup>n</sup>. I.e., For all <sup>m</sup> such that *nbr* (n, m) and all joint states x = (x<sup>n</sup>, x<sup>m</sup>), y = (y<sup>n</sup>, y<sup>m</sup>):[θ<sup>n</sup>(x<sup>n</sup>)∧θ<sup>m</sup>(x<sup>m</sup>)<sup>∧</sup> T<sup>m</sup>(x, y) <sup>→</sup> θ<sup>n</sup>(y<sup>n</sup>)]

These constraints are in a simultaneous pre-fixpoint form over {θ<sup>n</sup>}. The least fixpoint is the strongest compositional invariant. For finite-state processes, this computation is polynomial-time in the size of the local state spaces.

**Theorem 1** [17]*. If* {θ<sup>n</sup>} *is a compositional inductive invariant then* <sup>i</sup> <sup>θ</sup><sup>i</sup> *is a global inductive invariant.*

**Symmetry Between Neighborhoods.** A neighborhood symmetry between nodes m and n is witnessed by a bijection, β, which maps edges in *In*(m) to those in *In*(n) and edges in *Out*(m) to those in *Out*(n); we call (m, β, n) a similarity. The set of similarities (m, β, n) is a groupoid<sup>2</sup>.

A *balance* relation ([17], *c.f.* [11]) links symmetries throughout a network: balanced nodes m, n have isomorphic neighborhoods, nodes connected to corresponding edges of m, n are themselves balanced, and so on. Formally, a balance relation, B, is a set of triples (m, β, n), such that (m, β, n) is a similarity; (n, β<sup>−</sup><sup>1</sup>, m) is in B; and for any node k that points to m, there is a node l which points to n and a bijection γ such that (k, γ,l) is in B, and γ(e) = β(e) for every edge e that is connected to both m and k.

The structure of this condition is similar to that of bisimulation (it is coinductive); thus, there is a greatest fixpoint, which is the largest balance relation. Nodes m, n are *balanced* if (m, β, n) is in the largest balance relation for some β.

<sup>1</sup> The notation, [ϕ], from Dijkstra and Scholten [7], means that ϕ is valid.

<sup>2</sup> I.e., (n, ι, n) is a similarity for the identity map ι; if (m, β, n) is a similarity, so is (n, β*−*<sup>1</sup>, m); and if (m, β, q) and (q, γ, n) are similarities, so is (m, (γβ), n).

A process network P *respects* balance relation B if balanced nodes are assigned processes with isomorphic initial states and transition relations: i.e., for all (m, β, n) <sup>∈</sup> B, it is the case that [In(β(s)) <sup>≡</sup> <sup>I</sup>m(s)] for all <sup>s</sup>, and [Tn(β(s), β(t)) <sup>≡</sup> Tm(s, t)] for all s, t. Similarly, we say that local assertions {φi} respect B if [φn(β(s)) <sup>≡</sup> φm(s)] for all (m, β, n) <sup>∈</sup> B. We abbreviate these conditions as [I<sup>n</sup> <sup>≡</sup> <sup>β</sup>(Im)], [T<sup>n</sup> <sup>≡</sup> <sup>β</sup>(Tm)] and [φ<sup>n</sup> <sup>≡</sup> <sup>β</sup>(φm)], respectively. Here, β is overloaded to permute local states of Pm. For local state <sup>s</sup> of node <sup>m</sup>, the local state β(s) at node n is defined as follows: the internal states of m in s and n in β(s) are identical and, for every edge e connected to m, the value on e in s is identical to the value of β(e) in β(s). A key result is that balanced nodes have isomorphic compositional invariants.

**Theorem 2 (**[17]**).** *If a process network respects balance relation* B*, its strongest compositional invariant also respects* B*.*

This theorem implies that it suffices to compute the strongest compositional invariant only for representative nodes<sup>3</sup>, as the invariants for all other nodes are isomorphic to those of their representatives.

### **3 The Local Mu-Calculus**

Intuitively, a local property is one that refers to the local state of a node, e.g., "the process at node n is in its critical section", or "the philosopher at node n holds all adjacent forks". We are interested in establishing a local property f(n), parameterized by node n, and so isomorphic between nodes, for *all* nodes of a process network. We represent such a property by a mu-calculus formula. This has two interpretations: one in the global state space, the other in a compositionally constructed local state space. Their connections are discussed in the next section.

#### **3.1 Syntax**

The local mu-calculus syntax and semantics is largely identical to that of the standard mu-calculus [15]. The only difference is the use of the <sup>E</sup>[<sup>U</sup> ] operator in place of EX, this is given a stuttering-insensitive semantics.

Let Σ be a set of atomic propositions, Γ be a set of propositional variables, and Δ a set of transition labels; these sets are mutually disjoint. Local mucalculus formulas are defined by the following grammar. A formula is one of


<sup>3</sup> A balance relation <sup>B</sup> induces the equivalence relation <sup>m</sup> -*B* <sup>n</sup> if (m, β, n) <sup>∈</sup> <sup>B</sup> for some β. The compositional fixpoint is calculated for a representative of each class of -*B*. In the fixpoint calculation, the assertion <sup>θ</sup>*n* is replaced by <sup>γ</sup>(θ*r*), where <sup>r</sup> is the representative for n, and γ is a chosen isomorphism such that (r, γ, n) is in B.


Operators <sup>A</sup>[<sup>ϕ</sup> <sup>W</sup><sup>a</sup> <sup>ψ</sup>] = <sup>¬</sup>E[¬ϕU<sup>a</sup> <sup>¬</sup>ψ] and νZ.ϕ(Z) = <sup>¬</sup>μZ.(¬ϕ(¬Z)) are the negation duals of <sup>E</sup>[<sup>U</sup> ] and μ, respectively, with Boolean operations <sup>∨</sup> and <sup>→</sup> defined as usual.

#### **3.2 Semantics**

A state space has the form (S, S0, R, L), where <sup>S</sup> is a set of states, <sup>S</sup><sup>0</sup> is the set of initial states, R <sup>⊆</sup> S <sup>×</sup> Δ ∪ {τ} × S is a left-total transition relation, and L : S <sup>→</sup> <sup>2</sup><sup>Σ</sup> labels states with atomic propositions. A path is a sequence s0, a0, s1, a1,... such that (s<sup>i</sup>, a<sup>i</sup>, s<sup>i</sup>+1) <sup>∈</sup> R for all i, where the sub-sequence a0, a1,... is the label sequence of the path.

The state set S generates a complete lattice of all subsets of S, ordered by set inclusion. A functional Π : 2<sup>S</sup> <sup>→</sup> <sup>2</sup><sup>S</sup> is monotone if for all A, B such that <sup>A</sup> <sup>⊆</sup> <sup>B</sup> it is the case that Π(A) <sup>⊆</sup> Π(B). By the Knaster-Tarski theorem, every monotone functional has a least and a greatest fixpoint. Consider a formula ϕ(Z1,...,Z<sup>d</sup>) with free variables Z1,...,Z<sup>d</sup>. Given an assignment <sup>λ</sup> mapping each free variable to a subset of S, the interpretation of ϕ under λ is defined inductively as follows. We write M,s <sup>|</sup><sup>=</sup> ϕ to mean that state s in space M satisfies a closed formula ϕ, i.e., s is in interp(ϕ, ) where is the empty interpretation.


#### **3.3 Local and Global Interpretations**

Let θ be a compositional invariant respecting a balance relation B. For any node n of the network, define H<sup>θ</sup> <sup>n</sup> as the following transition system:

	- A transition (labeled with <sup>n</sup>) by <sup>P</sup><sup>n</sup> from state <sup>s</sup>, or
	- An interference transition (labeled with <sup>m</sup>) by a neighbor <sup>P</sup><sup>m</sup> from a joint state (s, u) where θ<sup>n</sup>(s) and <sup>θ</sup><sup>m</sup>(u) hold, to a joint state (s , u ).

By the properties of a compositional invariant, <sup>s</sup> is in <sup>θ</sup><sup>n</sup> in both cases.

The only missing ingredient is a labeling of the states with atomic propositions. Given such a labeling, L, a closed formula evaluates to a set of local states.

The global transition system G defines the semantics of the process network. For a given <sup>n</sup>, let <sup>G</sup><sup>n</sup> be <sup>G</sup> with transitions by <sup>P</sup><sup>n</sup> labeled with <sup>n</sup>, transitions by neighbors m of n labeled with m, and all other transitions (which cannot change the local state of <sup>P</sup>n) labeled with <sup>τ</sup> . A local labeling <sup>L</sup> of <sup>P</sup><sup>n</sup> is extended to <sup>G</sup><sup>n</sup> by labeling a global state <sup>s</sup> with proposition <sup>p</sup> if <sup>p</sup> labels the local state of <sup>P</sup><sup>n</sup> in s. Formulas local to node n are evaluated over Gn. A closed formula evaluates to a set of global states.

#### **3.4 Simulation and Bisimulation**

For processes without τ actions, a simulation relation α from process P to process Q is a relation from the state space of P to that of Q, satisfying:


If a simulation relation exists from P to Q, we say that Q simulates P. It is well known that if Q simulates P, then any standard universal mu-calculus formula that holds for all initial states of Q also holds for all initial states of P. A universal local mu-calculus formula is one where its negation normal form does not contain <sup>E</sup>[<sup>U</sup> ]. Relation α is a bisimulation from P to Q if α is a simulation from P to Q and α<sup>−</sup><sup>1</sup> is a simulation from <sup>Q</sup> to <sup>P</sup>. It is well known that bisimilar processes satisfy the same standard mu-calculus properties.

For processes with τ transitions, one can relax the third condition to allow the possibility of stuttering (cf. [4]): if sαt holds, then for any state s reachable from s by a finite path π with label sequence τ <sup>∗</sup>; <sup>a</sup> (for a non-<sup>τ</sup> letter <sup>a</sup>), there is a state t reachable from t by a finite path δ labeled τ <sup>∗</sup>; a such that s and t are related by α, and every other pair of states u on π and v on δ is related by α. Relation α is a stuttering bisimulation if α and α<sup>−</sup><sup>1</sup> are stuttering simulations.

**Theorem 3.** *Stuttering simulation preserves universal local mu-calculus properties. Stuttering bisimulation preserves all local mu-calculus properties.*

#### **4 Connecting Local Mu-Calculus Interpretations**

We explore relationships between the local and global interpretation of formulas, and show the following:

– The local state spaces of balanced nodes are bisimilar. It follows from Theorem 3 that balanced nodes satisfy the same local mu-calculus formulas. From this result, to model check a property of the form ( i :: f(i)), it suffices to check f(i) for the representatives of the balance equivalence classes.


*Notation.* In the proofs below, for a local state s of node n, the notation s[n] refers to the internal state of <sup>P</sup><sup>n</sup> in <sup>s</sup>, and for an edge <sup>e</sup> that is connected to <sup>n</sup>, the notation s[e] refers to the value in s of the variable assigned to e.

#### **4.1 Bisimilarity Between Local State Spaces**

**Theorem 4.** *Let* B *be a balance relation on a process network* P*, and* θ *a compositional invariant for the network. If* P *and* θ *respect* B*, then for every* (m, β, n) *in* B,H<sup>θ</sup> <sup>m</sup> *and* <sup>H</sup><sup>θ</sup> <sup>n</sup> *are bisimilar up to* <sup>β</sup>*.*

**Proof:** The bisimulation relation R relates a local state s of node m to a local state t of node n if β(s) = t. Before getting to the details of the proof, which is technical, we sketch the main reasoning. First, local transitions are easily matched by symmetry. For an interfering transition from a neighbor k of m, by balance, there is a matching neighbor l of n with a symmetric interference transition. Crucially, the preservation of the compositional invariant under balance lets us transfer the joint state from which the interference transition occurs in Hθ <sup>m</sup> to a joint state with a matching interference transition in <sup>H</sup><sup>θ</sup> n.

Suppose that s, t are states of m and n in the local state spaces H<sup>θ</sup> <sup>m</sup> and <sup>H</sup><sup>θ</sup> n, respectively, such that sRt holds, that is β(s) = t. By construction of H<sup>θ</sup> <sup>m</sup> and Hθ <sup>n</sup>, θ<sup>m</sup>(s) and θ<sup>n</sup>(t) hold.

Consider a step transition T<sup>m</sup>(s, s ). Since <sup>T</sup><sup>m</sup> and <sup>T</sup><sup>n</sup> respect the balance relation, B, by the local symmetry between the transition relations, T<sup>n</sup>(β(s), β(s )) holds as well. Thus, for t <sup>=</sup> β(s ), we have that there is a step transition T<sup>n</sup>(t, t ) such that s Rt . By construction, s and t are successors of s and t, respectively, in the local state spaces.

Now consider an interference transition in H<sup>θ</sup> <sup>m</sup> from a joint state (s, u) where u is a local state of a neighbor k of m. The transition T<sup>k</sup>(u, u ) creates a joint state (s , u ). From the definition of balance, there is a neighbor l of n such that for some γ, we have (k, γ,l) in the balance relation. As θ respects B by assumption, we have that <sup>θ</sup><sup>l</sup> <sup>=</sup> <sup>γ</sup>(θ<sup>k</sup>). As <sup>θ</sup><sup>k</sup>(u) holds by the definition of the interference transition, the state v <sup>=</sup> γ(u) is in θ<sup>l</sup>. We claim that there is a matching transition from the joint state (t, v).

First, we show that the pair (t, v) forms a joint state. Consider any edge f that is shared between n and l. By balance, shared edges are mapped identically by β and γ; hence, e <sup>=</sup> β<sup>−</sup><sup>1</sup>(f) = <sup>γ</sup><sup>−</sup><sup>1</sup>(f) is shared by <sup>m</sup> and <sup>k</sup>. By the definition of t <sup>=</sup> β(s) and v <sup>=</sup> γ(u), we have that t[f] = s[e] and v[f] = u[e]. As (s, u) is a joint state, we have s[e] = u[e]; hence, t[f] = v[f]. As f was chosen arbitrarily, it follows that t and v agree on the values of all shared edges, so (t, v) is a joint state. Moreover, the state <sup>t</sup> is in <sup>θ</sup><sup>n</sup> by assumption, and <sup>v</sup> is in <sup>θ</sup><sup>l</sup> by construction.

By the similarity between <sup>P</sup><sup>k</sup> and <sup>P</sup>l, there is a transition <sup>T</sup>l(γ(u), γ(u )); letting v <sup>=</sup> γ(u ), this can be expressed as Tl(v, v ). That induces an interference transition in H<sup>θ</sup> <sup>n</sup> from the joint state (t, v) to a joint state (<sup>t</sup> , v ).

Finally, we show that t <sup>=</sup> β(s ). Let e be an edge connected to node m and let f <sup>=</sup> β(e). Note that f is shared between n and l if, and only if, e is shared between m and k. Now if f is not shared between n and l, then t [f] = t[f] by definition of interference; t[f] = s[e] as t <sup>=</sup> β(s); and s [e] = s[e] by definition of interference. By transitivity, t [f] = s [e], as required. If f is a shared edge, then t [f] = v [f] by joint state; v [f] = u [e] as v <sup>=</sup> γ(u ); and u [e] = s [e] by joint state. By transitivity, t [f] = s [e]. The internal states of t, t and s, s are (respectively) identical, as they are unaffected by interference. Hence, t <sup>=</sup> β(s ).

The proof so far shows that R is a simulation if (m, β, n) is in the balance relation. From the same argument applied to (n, β<sup>−</sup>1, m), which must also be in the balance relation, the inverse of R is also a simulation. Hence, R is a bisimulation between H<sup>θ</sup> <sup>m</sup> and <sup>H</sup><sup>θ</sup> <sup>n</sup>. **EndProof.**

We say that per-process propositional labelings *respect* balance if for every (m, β, n) in the balance relation, every atomic proposition p, and every local state <sup>s</sup>: [<sup>p</sup> <sup>∈</sup> <sup>L</sup><sup>n</sup>(β(s)) <sup>≡</sup> p <sup>∈</sup> L<sup>m</sup>(s)]. From Theorems <sup>3</sup> and 4, we obtain:

**Corollary 1.** *Let* f(i) *be a local mu-calculus formula parameterized by* i*. If the compositional invariant* θ *and the interpretation of the atomic propositions in* f *respect balance relation* B*, then for any* (m, β, n) *in* B *and any local state* s*:* Hθ <sup>m</sup>, s <sup>|</sup><sup>=</sup> f(m) *if, and only if,* H<sup>θ</sup> <sup>n</sup>, β(s) <sup>|</sup><sup>=</sup> f(n)*.*

#### **4.2 Local-Global Simulation**

From the point of view of a process <sup>P</sup><sup>m</sup>, a transition in the global state space is either a transition of P<sup>m</sup>, or an interference transition by one of the neighbors of m, or a transition by a "far away" process that has no immediate effect on the local space of m. Thus, global transitions can be simulated by step or interference transitions in the local space, with far-away transitions exhibiting stuttering. The converse need not be true, as interference transitions appear in the local space without the constraining context of the entire global state.

**Theorem 5.** *Let the scheduling of transitions in the global system be unconditionally fair. For every* m *and any compositional inductive invariant* θ,H<sup>θ</sup> m *simulates the global transition system* <sup>G</sup><sup>m</sup> *up to stuttering.*

**Proof:** For a global state s, let s[m] refer to the local state of node m in s. Define the relation R from global states to those of H<sup>θ</sup> <sup>m</sup> by (s, t) <sup>∈</sup> <sup>R</sup> iff <sup>θ</sup>(s) and s[m] = t. We show that R is a simulation, up to stuttering. The proof is by cases on the kinds of transitions from global state s to a successor state, s . As θ is a global *inductive* invariant by Theorem 1, it is the case that θ(s ) holds.

Suppose the transition is by process m. Thus, Tm(s[m], s [m]) should hold. As θm(s[m]) holds, this transition is in the local state space as well. Letting <sup>=</sup> s [m], we have s Rt .

t Suppose the transition is by a neighbor k of m, so that Tk(s[k], s [k]) holds, and for all edges e that are not connected to k, s [e] = s[e]. By definition, θm(s[m]) and θk(s[k]) hold, so this is a valid interference transition in the local state space H<sup>θ</sup> <sup>m</sup>. Denoting s[k] by u, this can be re-expressed as a joint transition from state (t, u) to (t , u ), where u <sup>=</sup> s [k]. Consider an edge e that is connected to m but not to k. Then t [e] = (by non-adjacency)t[e] = (by R) s[m][e] = (by non-adjacency) s [m][e]. Now consider an edge e that is shared by nodes m and k; then t [e] = (by shared edge) u [e] = (by definition) s [k][e] = (by shared edge) s [m][e]. The internal state of m is unchanged on either transition. Thus, <sup>=</sup> s [m], so that s Rt , as desired.

t Finally, suppose the transition is by a process that is not a neighbor of m. Then s [m] = s[m], so that s Rt holds. This is the stuttering step. As transitions are scheduled in an unconditionally fair manner, on any infinite computation from s, process m or one of its neighbors must eventually make a move. Hence, all stuttering is bounded. This establishes (fair) stuttering simulation between the two spaces. **EndProof.**

From the preservation of universal local mu-calculus properties under stuttering simulation, we have:

**Corollary 2.** *If* f(m) *is a universal local mu-calculus formula, then for any* t, s *such that* s[m] = t*:* H<sup>θ</sup> <sup>m</sup>, t <sup>|</sup><sup>=</sup> f(m) *implies that* G<sup>m</sup>, s <sup>|</sup><sup>=</sup> f(m) *under fairness.*

#### **4.3 Outward-Facing Interactions and Local-Global Bisimulation**

The obstacle to establishing bisimilarity in the proof of Theorem 5 is that an interference transition from local state t may not have a corresponding transition from a related global state s, as the internal state of the interfering neighbor in s may be different from the internal state of the interfering neighbor of t. In some protocols, however, we see that interference depends only on the shared state. For instance, in a form of the dining philosophers' protocol where a process may give up a fork if it is not eating, the interference transition (passing a fork to a neighbor) is dependent only on possession of the fork. In this setting, one can indeed show that the two spaces are bisimilar.

We express the independence from internal state as a stuttering bisimulation within the interfering process. Define a relation <sup>B</sup>m,n on the local state space of <sup>P</sup><sup>n</sup> by (u, v) <sup>∈</sup> <sup>B</sup>m,n if <sup>u</sup> and <sup>v</sup> are both in <sup>θ</sup><sup>n</sup>, and <sup>u</sup>[e] = <sup>v</sup>[e] for every edge <sup>e</sup> shared between m and n. We say that process n is *outward-facing* in interactions with its neighbor <sup>m</sup> if the relation <sup>B</sup>m,n is a stuttering bisimulation on <sup>H</sup><sup>θ</sup> n.

**Theorem 6.** *With outward-facing interaction, the local state space of process* m *is stuttering bisimilar to the global state space in terms of the local state of* m*.*

**Proof:** Define the relation R from global states to those of H<sup>θ</sup> <sup>m</sup> as in the proof of Theorem <sup>5</sup> by (s, t) <sup>∈</sup> R iff θ(s) and s[m] = t.

Consider a transition from t to t . If the move is by process m, it is enabled in s as well, and the resulting states are related by R. Now suppose the move is an interference transition by a neighbor, n. Hence there is some joint state (t, u) of (m, n) such that the move is by n from (t, u) to (t , u ). As <sup>u</sup> <sup>∈</sup> <sup>θ</sup><sup>n</sup> (by joint state) and <sup>s</sup>[n] <sup>∈</sup> <sup>θ</sup><sup>n</sup> (by definition of <sup>R</sup>), and the two are connected to the same local state of <sup>m</sup>, the pair (s[n], u) is in <sup>B</sup>m,n. As <sup>B</sup>m,n is a stuttering bisimulation, there is a sequence, say <sup>σ</sup>, of transitions by <sup>P</sup><sup>n</sup> alone from <sup>s</sup>[n] to a state v such that (v , u ) <sup>∈</sup> Bm,n, and all intermediate states on σ from s[n] to <sup>v</sup> are related by <sup>B</sup>m,n to <sup>u</sup>. Hence, the value of the shared edges between <sup>m</sup> and n is unchanged on σ until the final step, where it matches u . Therefore, for the global computation induced by σ from s, the final state s is such that <sup>s</sup> Rt , and for all intermediate global states x on that path, xRt holds. This shows that R<sup>−</sup><sup>1</sup> is a stuttering simulation from the local to the global space. By Theorem 5, the relation R is a simulation from the global to the local space. Hence, R is a stuttering bisimulation between the spaces. **EndProof.**

**Corollary 3.** *With outward-facing interaction and unconditionally fair scheduling, the local state space of a process* m *satisfies the same local mu-calculus properties as the global state space.*

### **5 Syntactic Determination of Local Symmetries**

We show how to recognize local symmetry from syntactic structure. This also applies to network families, with corresponding unbounded savings in local verification. First, we use relations between structure and global symmetry, and between global and local symmetries. Next, we show how local symmetries may be directly derived if network families are induced by a finite set of tilings. We note that when local symmetry is derived syntactically, either through the use of normal process descriptions, or through building block tiles, the computation of the compositional invariant can be done symbolically, and in the case of tilings, directly on each tile, unlike the case of global symmetry reduction, where the symbolic (BDD-based) orbit relation is difficult to compute even for fully symmetric networks [5].

#### **5.1 Program Symmetries**

Let <sup>P</sup> <sup>=</sup> ||<sup>i</sup>∈[0..k−1]P<sup>i</sup>, k <sup>≥</sup> 1 be a fixed network where each component <sup>P</sup><sup>i</sup> is an implementation of a process template W. Network topology is restricted so that all edges are bidirectional and connect only two nodes. Each <sup>P</sup><sup>m</sup> is described by a finite transition graph where if there is an arc from the internal node g to the internal node h then the arc is labeled by a guarded command ρ <sup>→</sup> A. Transitions are given by g : ρ <sup>→</sup> A : h where A is the local update function and ρ is a predicate over the neighborhood of P<sup>m</sup>. The action <sup>A</sup> is given by a list of simultaneous updates to the shared variables, <sup>v</sup><sup>1</sup>,...,v<sup>d</sup>, where <sup>v</sup><sup>i</sup> is the variable across the edge (m, n<sup>i</sup>).

We name the variables associated with a process, depending on the specific topology, the left variable, the right variable, the forward variable of Pm, etc. This modeling tactic is used (see [8]) to stipulate that the update functions for the variables be process-index independent.

Two transitions g : ρ <sup>→</sup> A : h and g : <sup>ρ</sup> <sup>→</sup> <sup>A</sup> : <sup>h</sup> are equivalent if g <sup>=</sup> g , h <sup>=</sup> h , ρ is semantically equivalent to ρ and A and A are semantically equivalent (*c.f.* [8]). Processes <sup>P</sup><sup>m</sup> and <sup>P</sup><sup>n</sup> are equivalent if there is a bijective mapping between equivalent transitions of <sup>P</sup><sup>m</sup> and <sup>P</sup>n. A permutation <sup>π</sup> of process indices is an automorphism of <sup>P</sup> if <sup>P</sup><sup>m</sup> is equivalent to <sup>P</sup>π(m) for all m <sup>∈</sup> [0..k <sup>−</sup> 1].

As shown in [8] the global symmetries of the program P, essentially the permutations of [0..k <sup>−</sup> 1] that leave P unchanged, are a subset of the global symmetries of the global state space G. From P, one defines an undirected graph, the *communication relation*, CR [8]. The nodes of CR are the nodes of N of the topology (N,E) and there is an edge from m to n in CR iff the nodes are connected to a common edge.

P is *normal* [8] if the transitions of P are given in the following form:

$$g: \left(\wedge\_{n \in CR(m)} \rho(m, n)\right) \to \left(\wedge\_{n \in CR(m)} A(m, n)\right) : h$$

where each <sup>ρ</sup>(m, n) is a boolean expression over the internal state of <sup>P</sup><sup>m</sup> and the neighborhood variables of P<sup>m</sup>, or equality tests between the variables local to the neighborhood of P<sup>m</sup>, and the assignments of <sup>A</sup>(m, n) are concurrent assignments to the neighborhood variables of P<sup>m</sup>, where variable values may be swapped with each other or assigned constant values. When P is a normal process network [8] showed that global symmetries of CR are symmetries of P and are automorphisms of G.

This setting substantially simplifies the application of local symmetry. First, the balance relation can be "read off" directly from the relation CR, as by results in [17], the global symmetries of CR define a balance relation over (N,E), which includes (m, β, n) if there is a symmetry π of CR such that π(m) = n. Secondly, if CR induces a transitive symmetry group, then local symmetry reduction reduces to analysis of a single representative process and its neighborhood. This may result in an exponential reduction in the cost of model checking, compared with an analysis of the entire state space. (The global symmetry used in [8] provides an exponential reduction only when CR is fully symmetric.) The check is in general over-approximate (cf. Corollary 2) but is exact under outward-facing interaction. In the parametric setting, the reduction is unbounded.

#### **5.2 Tilings**

Rings, tori, and other 'regular' network patterns have considerable local symmetry but little global symmetry. Here we show how to enforce local symmetry across network families by generating them from a finite set of *tiles*. The tiles directly induce local symmetries and balance.

Consider a fixed, finite set of process types where each process type has a fixed, finite set of edge directions, which are given unique names. The initial condition and the transition relation of a process type may refer to the values on edges in the given direction. Each type is associated with a tile describing a fixed neighborhood pattern around a node of that type. The pattern specifies for each edge connected to the central node its direction from the center and the type and direction of the other process connected to it. The tiles induce a family of networks, typically of unbounded size, as follows. A network is in the family if (1) each node is assigned an instance of a process type, and (2) the neighborhood of a node matches the tile for that node type. For instance, a tile for a torus shape would have 4 neighbors, labeled north, south, east and west.

A network family constructed in this manner has an induced balance relation, B, defined as follows. Let m, n be nodes of a network in the family. Let (m, β, n) belong to B if (a) both nodes are instances of the same type and (b) β is the mapping which, for each direction a, relates the edge reachable in direction a from m to the edge reachable in the same direction from n. (E.g., it maps the north edge of m to the north edge of n.)

**Theorem 7.** B *is a balance relation for the induced family, with finitely many equivalence classes.*

**Proof:** We show that B is a balance relation, and that it is respected by the process assignment. The mapping β is an isomorphism of the edges connected to m and n, as both have the same type. Moreover, as their initial conditions and transition relations are derived from those of the type and are independent of node identities, they are isomorphic up to β.

We now establish that B meets the balance relation. Consider a direction a. Let m (n ) be the node connected to m (n) in that direction. As m and n have the same tiling pattern, m and <sup>n</sup> have the same type, so the tuple (m , γ,n ) is in B, for the isomorphism γ between the edges of m and n as given in the definition of B. Consider the edge e reached from m in direction a, and let b be the direction that this edge is reached from m . Let f be the edge in direction a from n. As m and n follow the same tiling pattern, f must be reached from direction b from n . Therefore, β and γ agree on this edge. As the edge was chosen arbitrarily, this establishes the balance condition. The number of equivalence classes induced by the greatest balance relation is, then, at most the number of tiles, which equals the number of process types. **EndProof.**

Theorem 7 implies that the compositional analysis of all instances of the network family can be reduced to the analysis of a finite set of representatives. This contrasts with global symmetry reduction for network families, where parameterized collapse is not as simple, nor as general. Moreover, the required representatives are just the tiles. The easy syntactic symmetry reduction contrasts with the difficulty of computing global symmetry groups for network families.

#### **6 Applications**

**Example 1.** Consider a non-deterministic token-ring system P <sup>=</sup> ||<sup>i</sup>P<sup>i</sup>. The internal states of <sup>P</sup><sup>i</sup> range over {T, <sup>H</sup>, <sup>E</sup>} with shared variables <sup>x</sup><sup>i</sup> and <sup>x</sup><sup>i</sup>+1 ranging over {⊥,*tok*}. Initially, each process is in internal state *<sup>T</sup>* and either owns 0 tokens or owns 1 token. The initial condition specifies that a single process owns the token. Processes cycle through states in the order *<sup>T</sup>*, *<sup>H</sup>* and *<sup>E</sup>*. A process in *H* can move to *E* only if it owns the token. When exiting *E* the process puts the token on its right and enters *T*. If a process is in *T* and has the token, then it either enters *H* or passes the token to the right. It can be shown that the process interactions are outward-facing. Verification of the mutual exclusion property *for all* <sup>i</sup>: AG(E<sup>i</sup> <sup>→</sup> (x<sup>i</sup> <sup>=</sup> *tok*)) can then be performed on a model with 3 processes that suffices to see all reachable local states.

In addition, a liveness property, *for all* <sup>i</sup> : AG(H<sup>i</sup> <sup>→</sup> AFE<sup>i</sup>), can also be verified using a combination of local arguments. The proof is constructed as follows: first, show that the system satisfies the invariant that there is exactly 1 token in the system. Then show every process that has the token eventually passes the token to the neighbor on the right. Using the global system fairness assumption that each process executes infinitely often we can chain these proofs together to conclude that for any particular process <sup>P</sup><sup>n</sup>: AG(H<sup>n</sup> <sup>→</sup> AFE<sup>n</sup>) holds which by local symmetry implies: *for all* <sup>i</sup> : AG(H<sup>i</sup> <sup>→</sup> AFE<sup>i</sup>).

**Example 2.** Interestingly, the results about a single token ring network can be extended to a ring with 2 tokens. However, the minimal model requires 4 processes. Similar reasoning holds for 3 tokens and we hypothesize can be generalized to any fixed number of tokens. A related example is a ring with 2 types of processes, one labeled *red* and one labeled *black*. For rings with even numbers of processes, half of them *red* and half of them *black*, there are 2 equivalence classes. Local symmetry reduction can be used to verify behavior of the two equivalence classes for any even number of processes, though the networks have little global symmetry and do not have transitive symmetry.

**Example 3.** Several works including [3,9,10,14] have considered using counting arguments as a way of implementing full symmetry reduction. Given an n process system, with isomorphic processes having local state spaces of size m, and full global symmetry on [1..n] the idea is to replace the global symmetry-reduced model with a set of m counters, where the counter values record the number of components in each of the different local states. A combinatorial argument [22] shows that the number of combinations of n isomorphic process each with m local states, is (m <sup>+</sup> n <sup>−</sup> 1)!/(n!(m <sup>−</sup> 1)!). If n > <sup>2</sup>m, this is more than 2<sup>m</sup>. On the other hand, if each component has b neighbors, the local representative (full global symmetry implies a single balance class) has a local state space of size approximately m<sup>b</sup>. Over a parametric analysis <sup>m</sup><sup>b</sup> is a constant and <sup>b</sup>, the number of neighbors, is likely to be small in comparison with m.

### **7 Discussion and Related Work**

We studied the relationship between the satisfaction of temporal properties on the global state space of a process network and on individual local state spaces. We show that "balanced" processes have bisimilar local spaces and therefore satisfy the same local mu-calculus formulas. Hence, for a local formula f(n) that is universal in nature, the satisfaction of f(n) on the local space of node n implies that f(n) holds of the global state space. Thus, if universal formulas {f(n)} hold for all nodes n, then ( i : f(i)) holds for the global state space. This provides an approximate way to establish quantified mu-calculus properties. Moreover, as balanced nodes satisfy the same formulas, it is only necessary to model-check representatives of the balance equivalence relation. For a fixed process network, the restriction to local state spaces can result in exponential savings (in the number of nodes), and the further restriction to representative spaces results in a further linear cost saving. More dramatically, we show that network families constructed from building-block "tiles" have a finite set of representative nodes, so the cost saving is unbounded for parametric analysis. When network processes communicate with their neighbors in an outward-facing manner, these results carry over to the entire local mu-calculus, not just to universal properties.

The results build on our earlier work on balance relations and local symmetry [17,18,20]. That work focused on compositional invariants [21] the central result being that the strongest compositional invariants for balanced nodes are isomorphic. The current paper shows that the isomorphism applies to all local mu-calculus properties. The local state spaces on which the mu-calculus properties are evaluated are built using compositional invariants. An elegant methodology using 3-valued logic to compositionally verify mu-calculus properties is developed in [23]; however, it applies to pairs of processes, and thus does not consider symmetries in larger networks. The definition of network families through tilings has similarities to the network grammars used in [24,26]; however, the verification techniques are different.

The framework of this paper considers the neighborhood of a single node. Compositional invariants have been generalized to apply to groups of processes, to accommodate properties stated over all pairs i, j, or over all neighbors i, j; see for example [1,6,12,13,16]. Construction of a comprehensive theory of neighborhood symmetry for groups of processes is still an open question.

Global symmetry reduction, developed in [5,8,14], is based on a beautiful mathematical theory of automorphisms in graphs. However, in practice, symmetry reduction runs into difficulties, usually because there is not enough global symmetry in a process network, but also because for even highly symmetric networks, symbolic manipulation of symmetry reduced structures is difficult. In fact [5] shows that any BDD-based representation of the global symmetry group for any network with only transitive symmetry would likely incur a prohibitive cost. By focusing on local similarities, a strict generalization of global symmetries [17,20], we can avoid these problems and obtain exponential improvements. The theory of local symmetries is based on network groupoids, and we note that any network automorphism group induces a balance relation.

We also consider parameterized verification. For network families built from building-block tiles, there is a finite set of representative neighborhoods, and it suffices to prove a parameterized local mu-calculus property for each of those representatives to show that it holds for the entire family. This is an approximate method for parameterized verification. In prior work [20], we had introduced the local PCMCP (parameterized compositional model-checking) question as a decision problem that is, in many cases, more tractable than the global PMCP (parameterized model-checking) problem. Deciding PCMCP for local mu-calculus properties is a challenging open question.

**Acknowledgements.** Kedar Namjoshi was supported, in part, by grant CCF-1563393 from the National Science Foundation. Richard Trefler was supported, in part, by an Individual Discovery Grant from the Natural Sciences and Engineering Research Council of Canada. Both authors thank E. Allen Emerson for inspiring discussions on the topic.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Bayesian Statistical Parameter Synthesis for Linear Temporal Properties of Stochastic Models**

Luca Bortolussi<sup>1</sup> and Simone Silvetti2,3(B)

<sup>1</sup> University of Trieste, Trieste, Italy lbortolussi@units.it <sup>2</sup> University of Udine, Udine, Italy simone.silvetti@gmail.com <sup>3</sup> Esteco S.p.A., Trieste, Italy

**Abstract.** Parameterized verification of temporal properties is an active research area, being extremely relevant for model-based design of complex systems. In this paper, we focus on parameter synthesis for stochastic models, looking for regions of the parameter space where the model satisfies a linear time specification with probability greater (or less) than a given threshold. We propose a statistical approach relying on simulation and leveraging a machine learning method based on Gaussian Processes for statistical parametric verification, namely Smoothed Model Checking. By injecting active learning ideas, we obtain an efficient synthesis routine which is able to identify the target regions with statistical guarantees. Our approach, which is implemented in Python, scales better than existing ones with respect to state space of the model and number of parameters. It is applicable to linear time specifications with time constraints and to more complex stochastic models than Markov Chains.

**Keywords:** Parameter synthesis · Parametric verification Smoothed model checking · Gaussian Processes

### **1 Introduction**

*Overview.* Stochastic models are commonly used in many areas to describe and reason about complex systems, from molecular and systems biology to performance evaluation of computer networks. In all these cases, the system dynamics is usually described by high-level languages as Chemical Reaction Networks [1], population models [2] or Stochastic Petri Nets [3], which generate an underlying *Continuous Time Markov Chain* (CTMC). Formal reasoning about these models often amounts to the computation of reachability probabilities. This is the basic tool behind successful *Stochastic Model Checking* tools like PRISM [4] or the more recent STORM [5]. These tools implement numerical algorithms that compute probabilities up to a given precision, suffering though from state space explosion, as well as simulation engines that allow statistical estimation when models are too large.

All classic quantitative verification tools assume that a model is fully specified, which is typically a strong assumption, particularly in application domains like system biology, where many model parameters are estimated from data or are only known to belong to a given range. An alternative approach is that of parameterised verification, which tries to verify properties for a whole set of models, indexed by some parameters. In case of stochastic models, this typically requires us to compute how reachability probabilities change as a function of model parameters, which is a much harder task [6]. A related problem is that of synthesis [7], where one looks for a subset of the parameter space where a given property (or multiple properties [8]) is guaranteed to be satisfied. Alternatively, one can try to design a system by finding a value that maximises the probability of satisfying a specification.

*Problem Statement.* In this paper, we focus on parameter synthesis for CTMC models described by chemical reaction networks, benchmarking against the approach of [7].

More specifically, we consider the following problem. We have a collection of CTMCs, indexed by a parameter vector *θ* ∈ Θ, taking values in a bounded and compact hyperrectangle <sup>Θ</sup> <sup>⊂</sup> <sup>R</sup><sup>k</sup>. We assume that the CTMCs depends on *θ* through their rates, and that this dependency is smooth. We consider a linear time specifications φ described by Metric Interval Temporal Logic [9], with bounded time operators. For each φ and *θ*, we can in principle compute the probability that a random trajectory, generated by that specific CTMC, satisfies it, i.e. Pφ(*θ*).

Our goal is to find a partition of the parameter space Θ composed by three classes. The positive class P<sup>α</sup> which is composed by parameters where the probability of satisfying φ is higher than a threshold value α, the negative class N<sup>α</sup> composed by parameters where this probability is lower than α and the undefined class U<sup>α</sup> which collects all the other parameters. Following [7], we will look for a partition where the volume of the undefined class is lower a fraction of the volume of Θ. This is the *threshold synthesis problem*.

Our approach will be statistic: we assume that models are too complex to numerically compute bounds on the reachability probability, and we only rely on the possibility of simulating the model. As a consequence, our solution to the parameter synthesis problem will have only statistical guarantees of being correct. For example, if a parameter belongs to Pα, the confidence of this point satisfying Pφ(*θ*) ≥ α will be larger than a prescribed probability (typically 95% or 99%), though for most points this probability will be essentially one, and similarly for Nα. The challenge of such an approach is that estimating the satisfaction probability at many different points in the parameter space by simulation is very expensive and inefficient, unless we are able to share the information carried by simulation runs at neighbouring points in the parameter space.

*Contributions.* We propose a Bayesian statistical approach for parameter synthesis, which leverages a statistical parameterised verification method known as Smoothed Model Checking [6] and the nice theoretical approximation properties of Gaussian Process [10]. Being based on a Bayesian inference engine, this naturally gives statistical error bounds for the estimated probabilities. Our algorithm uses active learning strategies to steer the exploration of the parameter space only where the satisfaction probability is close to the threshold. We also provide a prototype implementation of the approach in Python.

Despite being implemented in Python, our approach turns to be remarkably efficient, being slightly faster than [7] for small models, and outperforming it for more complex and large models or when the number of parameters is increased, at the price of a weaker form of correctness. Compared to [7], we also have an additional advantage: the method treats the simulation engine and the routine to verify of the linear time specification on individual trajectories as black boxes. This means that we can not only treat arbitrary MTL properties (while in [7] they is an essential restriction to non-nested CSL properties, i.e. reachability), but also other more complex linear time specifications (e.g. using hybrid automata, provided that the satisfaction probability is a smooth function of model parameters), and we can also apply the same approach to more complex stochastic models for which efficient simulation routines exist, like stochastic differential equations.

*Related Work.* Parameter synthesis of CTMC is an active field of research. In [7,11] the authors use Continuous Stochastic Logic (CSL) and uniformization methods for computing exact probability bounds for parameteric models of CTMCs obtained from chemical reaction networks. In [12] the same authors extend their algorithm to GPU architecture to improve the scalability. Authors in these two papers solve two problems: one is the threshold synthesis, the other is the identification of a parameter configuration maximising the satisfaction probability. In this paper we focus on the former, as we already presented a statistical approach to deal the latter problem in [13] for the single objective case and in [8] for the multi-objective case. An alternative statistical approach for multi-objective optimisation is that of [14], where authors use ANOVA test to estimate the dominance relation. Another approach to parameter synthesis for CTMC is [15], where the authors rely on a combination of discretisation of parameters with a refinement technique.

In this work we use a statistical approach to approximate the satisfaction probability function, building on Smoothed Model Checking [6]. This approach is applicable to CTMC with rate functions that are smooth with respect to parameters, and leverages statistical tools based on Gaussian Process regression [10] to learn an approximation of the satisfaction function from few observations. Moreover, this approach allows us to deal with a richer class of linear time properties than reachability, like those described by Metric Temporal Logic [9,16], for which numerical verification routines are heavily suffering from state space explosion [17]. Another statistical approach is that of [18], which combines sensitivity analysis, statistical model checking and uniform continuity to approximate the satisfaction probability function, but it is restricted to cases when the satisfaction probability is monotonic in the parameters. In contrast, Gaussian Process-based methods have no restriction (as Gaussian Processes are universal approximators), and have also the advantage of requiring much less simulations than pointwise statistical model checking, as information is shared between neighbouring points (see [6] for a discussion in this sense). Parametric verification and synthesis approaches are more consolidated for Discrete Time Markov Chains [19], where mature tools like PROPhESY exist [20], which rely on an symbolic representation of the reachability probability, which does not generalise to the continuous time setting.

*Paper Structure.* The paper is organized as follows. In Sect. 2 we discuss background material, including Parametric CTMCs, MITL, and Smoothed Model Checking and Gaussian Processes. In Sect. 3 we present our method in detail. In Sect. 4 we discuss experimental results, comparing with [7]. Conclusions and future work are discussed in Sect. 5.

### **2 Background**

In this section we introduce the relevant background material: a formalism to describe the systems of interest, i.e. Parametric Chemical Reaction Networks, and one to describe linear time properties, i.e. Signal Temporal Logic. We then present smoothed model checking [21] and Gaussian Processes [10], which form the underlying statistical backbone of the parameter synthesis.

#### **2.1 Parametric Chemical Reaction Networks**

Chemical Reaction Networks [1] are a standard model of population processes, known in literature also as Population Continuous Time Markov Chains [2] or Markov Population Models [22]. We consider a variant with an explicit representation of kinetic parameters.

**Definition 1.** *A Parametric Chemical Reaction Network (PCRN)* M *is a tuple* (S, **X**, D, **x0**, R, Θ) *where*


$$r\_j: u\_{j,1}s\_1 + \ldots + u\_{j,n}s\_n \stackrel{\alpha\_j}{\longrightarrow} w\_{j,1}s\_1 + \ldots + w\_{j,n}s\_n,$$

*where* uj,i *(*wj,i*) is the amount of elements of species* s<sup>i</sup> *consumed (produced) by reaction* r<sup>j</sup> *. With uj* = (uj,<sup>1</sup>,...,uj,n) *(and similarly wj ), vj* = *wj* − *uj .* *– θ* = (θ1,...,θk) *is the vector of (kinetic) parameters, taking values in a compact hyperrectangle* <sup>Θ</sup> <sup>⊂</sup> <sup>R</sup>k*.*

To stress the dependency of M on the parameters *θ* ∈ Θ, we will often write M*<sup>θ</sup>* . A PCRN M*<sup>θ</sup>* defines a Continuous Time Markov Chain [2,23] on D, with infinitesimal generator Q, where Q*x*,*<sup>y</sup>* = - <sup>r</sup>*j*∈R{α<sup>j</sup> (*x*, *<sup>θ</sup>*) <sup>|</sup> *<sup>y</sup>* <sup>=</sup> *<sup>x</sup>* <sup>+</sup> *vj* }, *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>*. We denote by P*<sup>θ</sup>* the probability over the paths P athM*<sup>θ</sup>* of M*<sup>θ</sup>* of such a CTMC.

#### **2.2 Metric Interval Temporal Logic**

Metric Interval Temporal Logic (MITL [16]) is a discrete linear time temporal logic used to reason about the future evolution of a path in continuous time. Generally this formalism is used to qualitatively describe the behaviors of trajectories of differential equations or stochastic models. The temporal operators we consider are all time-bounded, like in Signal Temporal Logic [9], a signal-based version of MITL. This implies that time-bounded trajectories are sufficient to verify every formula. The atomic predicates of MITL are inequalities on a set of real-valued variables, i.e. of the form <sup>μ</sup>(*X*):=[g(*X*) <sup>≥</sup> 0], where <sup>g</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> is a continuous function and consequently <sup>μ</sup> : <sup>R</sup><sup>n</sup> → {, ⊥}.

**Definition 2.** *A formula* φ ∈ F *of MITL is defined by the following syntax:*

$$\phi := \bot \mid \top \mid \mu \mid \neg \phi \mid \phi \lor \phi \mid \phi \mathbf{U}\_{[T\_1, T\_2]} \phi,\tag{1}$$

*where* μ *are atomic predicates as defined above, and* T<sup>1</sup> < T<sup>2</sup> < +∞*.*

Eventually and globally modal operators are defined as customary as **F**[T1,T2]φ ≡ **U**[T1,T2]φ and **G**[T1,T2]φ ≡ ¬**F**[T1,T2]¬φ. MITL formulae are interpreted over the paths *x*(t) of a PCRN M*<sup>θ</sup>* . We will consider here the Boolean semantics of [9], which given a trajectory *x*(t), returns either true or false, referring the reader to [9] for its definition and for a description of monitoring algorithms. Combining this with the probability distribution P*<sup>θ</sup>* over trajectories induced by a PCRN model M*<sup>θ</sup>* , we obtain the satisfaction probability of a formula φ as

$$P\_{\phi}(\theta) \equiv P(\phi \mid \mathcal{M}\_{\theta}) := P\_{\theta}(\{x(t) \in \operatorname{Path}^{\mathcal{M}\_{\theta}} \mid (x, 0) \mid = \phi\})$$

#### **2.3 Parametric Verification and Smoothed Model Checking**

Given an MITL formula φ and a CTMC M*<sup>θ</sup>* , we consider two verification tasks:


The classic verification task can be solved with specialised numerical algorithms [17,24]. These methods calculate Pφ(*θ*) by a clever numerical integration of the Kolmogorov equations of the CTMC. This approach, however, suffers from the curse of state space explosion, becoming inefficient for big or complex models. A viable alternative is rooted in statistics. The key idea is to estimate the satisfaction probability by combining simulation and monitoring of MITL formulas. In practice, for each trajectory *x* generated by a simulation of the CTMC M*<sup>θ</sup>* , we verify if *x* |= φ. This produces observations of a Bernoulli random variable Zφ, which is equal to 1 if and only if the trajectory satisfies the property, and 0 otherwise. By definition, the probability of observing 1 is exactly Pφ(*θ*), which can thus be estimated by frequentist or Bayesian statistical inference [25,26].

Parametric verification brings additional challenges. For PCRN, the numerical approach of [27] provides upper and lower bounds on the satisfaction function. By decomposing the parameter space in small regions, one can provide a tight approximation of the satisfaction function, at the price of a polynomial cost in the dimension of the state space and of an exponential cost in the dimension of the parameter space [27].

The statistical counterpart for parametric verification is known as Smoothed Model Checking [6]. This method combines simulations in few points of the parameter space with state-of-the-art generalised regression methods from statistics and machine learning to infer an analytic approximation of the satisfaction function, mapping each *θ* to the corresponding value of Pφ(*θ*). The basic idea is to cast the estimation of the satisfaction function as a learning problem: from the observation of few simulation runs at some points of the parameter space, we wish to learn an approximation of the satisfaction function, with statistical error guarantees. Smoothed Model Checking solves this problem relying on Gaussian Process (generalised) regression, a Bayesian non-parametric method that returns in each point an estimate of the value of the satisfaction function together with confidence bounds, defining the region containing the true value of the function with a prescribed probability. The only substantial requirement for Smoothed Model Checking is that the satisfaction probability is smooth with respect to the parameters. This holds for MITL properties interpreted over PCTMCs [6]. Smoothed Model Checking will be the key tool for our synthesis problem, hence we will introduce it in more detail, after a brief introduction of its underlying inference engine, i.e. Gaussian Processes.

**Gaussian Processes.** Gaussian Processes (GPs) are a family of distributions over function spaces, used mostly for Bayesian non-parametric classification or regression. More specifically, a GP is a collection of random variables <sup>f</sup>(*x*) <sup>∈</sup> <sup>R</sup> (*<sup>x</sup>* <sup>∈</sup> <sup>E</sup>, a compact subset of <sup>R</sup><sup>h</sup>) of which any finite subset defines a multivariate normal distribution. A GP is uniquely determined by its mean and covariance functions (called also kernels) denoted respectively with <sup>m</sup> : <sup>E</sup> <sup>→</sup> <sup>R</sup> and <sup>k</sup> : <sup>E</sup> <sup>×</sup> <sup>E</sup> <sup>→</sup> <sup>R</sup> such that for every finite set of points (*x*1, *<sup>x</sup>*2,...,*x*n):

$$f \sim \mathcal{GP}(m, k) \iff (f(x\_1), f(x\_2), \dots, f(x\_n)) \sim \mathcal{N}(\mathbf{m}, K) \tag{2}$$

where **<sup>m</sup>** = (m(t1), m(t2),...,m(tn)) is the vector mean and <sup>K</sup> <sup>∈</sup> <sup>R</sup>n×<sup>n</sup> is the covariance matrix, such that Kij = k(*x*i, *x*<sup>j</sup> ). From a functional point of view, GP is a probability distribution on the set of functions <sup>g</sup> : <sup>E</sup> <sup>→</sup> <sup>R</sup>. The choice of the covariance function is important from a modeling perspective because it determines which functions will be sampled with higher probability from a GP, see [10].

GP are popular as they provide a Bayesian non-parametric framework for regression and classification. Starting from a training set {(*x*i, yi)}i=1,...,n of input *x*<sup>i</sup> and output y<sup>i</sup> pairs, and a prior GP, typically with zero mean and a given covariance function, GP regression computes a posterior distribution given the observations, which is another GP, whose mean and covariance depend on the prior kernel and the observation points. In particular, for real valued y<sup>i</sup> and Gaussian observation noise, the posterior mean at a point *x*<sup>∗</sup> is a linear combination of the prior kernel k(*x*∗, *x*i) evaluated at *x*<sup>∗</sup> and observation points *x*<sup>i</sup> with coefficients depending on the observations yi. The prior kernel thus plays a central role, and it sometimes depends on hyperparameters, that can be set automatically by optimising the marginal likelihood, as traditionally happens in Bayesian methods [10].

In this work we use the Gaussian Radial Basis Function (GRBF) kernel [10], as samples from a GP defined by it can approximate arbitrarily well any continuous function on a compact set E. The kernel is defined as

$$k(\mathbf{x}\_1, \mathbf{x}\_2) = \exp(-||\mathbf{x}\_1 - \mathbf{x}\_2||^2 / l^2),$$

where l is the lengthscale hyperparameter, which roughly governs how far away observations are contributing to predictions in a point (as if *x*<sup>∗</sup> and *x*<sup>i</sup> are much more distant than l, then k(*x*∗, *x*i) is approximately zero). Moreover, l determines the Lipschitz constant of the GRBF kernel, which is √<sup>2</sup>/e <sup>l</sup> , and *a fortiori* of the prediction itself (being a linear combination of kernel functions).

**Smoothed Model Checking.** Smoothed Model Checking is a statistical method to estimate the function Pφ(*θ*), casting it into a learning problem taking as input the truth value of φ for several simulations at different parameter values *θ*1,..., *θ*n, with few simulation runs (M +∞) per parameter point. The method tries to reconstruct a real-valued latent function f(*θ*), which is squeezed to [0, 1] via the Probit transform<sup>1</sup> Ψ to give the satisfaction probability at *θ*: Pφ(*θ*) = Ψ(f(*θ*)). Let us denote with O = [**o**1, **o**2,..., **o**n] the matrix whose rows **o**<sup>i</sup> are the Boolean m-vectors of the evaluations in *θ*<sup>j</sup> . Hence, we have that each observation **o**<sup>i</sup> is an independent draw from a Binomial(M,Pφ(*θ*<sup>j</sup> ))).

Smoothed Model Checking plugs these observations into a Bayesian inference scheme, assuming a prior p(f) for the latent variable f. As f is a random function, one can take as a prior a GP, specifying its mean and kernel function,

<sup>1</sup> The Probit <sup>Ψ</sup>(x) = <sup>p</sup>(<sup>Z</sup> <sup>≤</sup> <sup>x</sup>) is the cumulative distribution function of a standard normal distribution Z ∼ N (0, 1), evaluated at the point x.

and then invoke Bayes theorem to compute the joint posterior distribution of f at a prediction point *θ*<sup>∗</sup> and at the observation points *θ*1,..., *θ*<sup>n</sup> as

$$p(f(\boldsymbol{\theta}^\*), f(\boldsymbol{\theta}\_1), \dots, f(\boldsymbol{\theta}\_n) \mid \mathbf{o}) = \frac{1}{Z} p(f(\boldsymbol{\theta}^\*), f(\boldsymbol{\theta}\_1), \dots, f(\boldsymbol{\theta}\_n)) \prod\_{j=1}^n p(\mathbf{o}\_j \mid f(\boldsymbol{\theta}\_j)).$$

In the previous expression, on the right hand side, Z is a normalisation constant, while p(f(*θ*∗), f(*θ*1),...,f(*θ*n)) is the prior, which is Gaussian distribution with mean and covariance matrix computed according to the GP. p(**o**<sup>j</sup> | f(*θ*<sup>j</sup> )), instead, is the noise model, which in our case is given by a Binomial density. By integrating out the value of the latent function at observations points in the previous expression, one gets the predictive distribution

$$p(f(\boldsymbol{\theta}^\*) \mid \mathcal{O}) = \int \prod\_{j=1}^n d(f(\boldsymbol{\theta}\_j)) p(f(\boldsymbol{\theta}^\*), f(\boldsymbol{\theta}\_1), \dots, f(\boldsymbol{\theta}\_n) \mid \mathcal{O}) \,.$$

The presence of a Binomial observation model makes this integral analytically intractable, and forces us to resort to an efficient variational approximation known as Expectation Propagation [6,10]. The result is a Gaussian form for the predictive distribution for p(f(*θ*<sup>∗</sup>) | O), whose mean and δ-confidence region are then Probit transformed into [0, 1].

It is important to stress that the prediction of Smoothed Model Checking, being a Bayesian method, depends on the choice of the prior. In case of Gaussian Processes, choosing the prior means fixing a covariance function, which makes assumptions on the smoothness and density of the functions that can be sampled by the GP. The Gaussian Radial Basis Function is dense in the space of continuous functions over a compact set [28], hence it can approximate arbitrarily well the satisfaction probability function. By setting its lengthscale via marginal likelihood optimization, we are picking the best prior for the observed data.

#### **3 Methodology**

#### **3.1 Problem Definition**

We start by rephrasing the parameter synthesis problem defined in [7] in the context of Bayesian statistics, where truths are quantified probabilistically. The basic idea is that we will exhibit a set of parameters that satisfy the specification with high confidence, which in the Bayesian world means with high posterior probability. To recall and fix the notation, let M<sup>θ</sup> be a PCRN defined over a parameter space Θ, φ a MITL formula and P˜φ(*θ*) be a statistical approximate model of the satisfaction probability of φ at each point *θ*. In the Bayesian setting, P˜φ(*θ*) is in fact a posterior probability distribution over [0, 1], hence we can compute for each measurable set <sup>B</sup> <sup>⊆</sup> [0, 1] the probability <sup>p</sup>(P˜φ(*θ*) <sup>∈</sup> <sup>B</sup>).

**Problem** (Bayesian Threshold Synthesis): Let <sup>M</sup>θ, <sup>Θ</sup>, <sup>φ</sup>, and <sup>P</sup>˜φ(*θ*) as before. Fix a threshold α and consider the threshold inequality Pφ(*θ*) > α, for the true satisfaction probability Pφ(*θ*). Fix > 0 a volume tolerance, and δ ∈ (0.5, 1] a confidence threshold. The *Bayesian threshold synthesis problem* consists in partitioning the parameter space Θ in three classes P<sup>α</sup> (positive), N<sup>α</sup> (negative) and U<sup>α</sup> (undefined) as follows:


Note that the set P<sup>α</sup> solves the threshold synthesis problem defined above, while N<sup>α</sup> solves the threshold synthesis problem Pφ(*θ*) < α.

#### **3.2 Bayesian Parameter Synthesis: The Algorithm**

Our Bayesian synthesis algorithm essentially combines smoothed Model Checking (smMC) with an active learning step to adaptively refine the sets Pα, Nα, Uα, trying to keep the number of simulations of the PCRN M<sup>θ</sup> to a minimum. smMC is used to compute a Bayesian estimate of the satisfaction probability, given the samples of the truth of φ accumulated up to a certain point. More specifically, we use the posterior distribution p(P˜φ(*θ*)) of the satisfaction probability at each *θ* returned by smMC to compute the following two functions of *θ*:

$$\begin{cases} -\ \lambda^+(\boldsymbol{\theta}, \boldsymbol{\delta}) \text{ is such that } p\left(\tilde{P}\_{\boldsymbol{\phi}}(\boldsymbol{\theta}) < \lambda^+(\boldsymbol{\theta}, \boldsymbol{\delta})\right) > \delta\\ -\ \lambda^-(\boldsymbol{\theta}, \boldsymbol{\delta}) \text{ is such that } p\left(\tilde{P}\_{\boldsymbol{\phi}}(\boldsymbol{\theta}) > \lambda^-(\boldsymbol{\theta}, \boldsymbol{\delta})\right) > \delta \end{cases}$$

Essentially, at each point *θ*, λ<sup>+</sup>(*θ*, δ) is the upper bound for the estimate P˜φ(*θ*) at confidence δ (i.e. with probability at least δ, the true value Pφ(*θ*) is less than λ<sup>+</sup>), while λ−(*θ*, δ) is the lower bound. These two values will be used to split the parameter space into the three regions Pα, Nα, U<sup>α</sup> as follows:

– *θ* ∈ P<sup>α</sup> iff λ−(*θ*, δ) > α – *<sup>θ</sup>* ∈ N<sup>α</sup> iff <sup>λ</sup><sup>+</sup>(*θ*, δ) < α – <sup>U</sup><sup>α</sup> <sup>=</sup> <sup>Θ</sup> \ (P<sup>α</sup> ∪ Nα), vol(U) vol(Θ) <

To dig into how λ<sup>+</sup> and λ<sup>−</sup> are computed, recall that smMC computes a realvalued Gaussian process fφ(*θ*), with mean function μ and covariance function k , from which the pointwise standard deviation can be obtained as σ(*θ*) = k(*θ*, *θ*). At each *θ*, the function fφ(*θ*) is Gaussian distributed, hence we can compute the upper and lower confidence bounds for the Gaussian, and then squeeze them into [0, 1] by the Probit transform Ψ. Letting β<sup>δ</sup> = Ψ <sup>−</sup>1( <sup>δ</sup>+1 <sup>2</sup> ), as customary while working with Normal distributions, we get:

– λ<sup>+</sup>(*θ*, δ) = Ψ(μ( ˜fφ(*θ*)) + βδσ( ˜fφ(*θ*))) – <sup>λ</sup>−(*θ*, δ) = <sup>Ψ</sup>(μ( ˜fφ(*θ*)) <sup>−</sup> <sup>β</sup>δσ( ˜fφ(*θ*)))


**Input:** <sup>Θ</sup> parameter space, <sup>M</sup> PCRN, <sup>φ</sup> MTL formula, <sup>α</sup> threshold, volume precision, δ confidence 1: S ← initial samples(Θ,M, φ) 2: P<sup>α</sup> ← ∅, N<sup>α</sup> ← ∅, U<sup>α</sup> ← Θ 3: **while** true **do** 4: <sup>λ</sup><sup>+</sup>, λ*<sup>−</sup>* <sup>←</sup> smoothed MC(Θ, <sup>S</sup>) 5: <sup>P</sup><sup>α</sup>, <sup>N</sup><sup>α</sup>, <sup>U</sup><sup>α</sup> <sup>←</sup> update regions( <sup>λ</sup><sup>+</sup>, λ*−*, <sup>P</sup><sup>α</sup>, <sup>N</sup><sup>α</sup>, <sup>U</sup><sup>α</sup>) 6: **if** vol(U<sup>α</sup>)/vol(Θ) < **then** 7: **return** <sup>P</sup><sup>α</sup>, <sup>N</sup><sup>α</sup>, <sup>U</sup><sup>α</sup> 8: **else** 9: S ← refine samples( <sup>S</sup>, <sup>U</sup><sup>α</sup>) 10: **end if** 11: **end while**

The Bayesian synthesis procedure is described in Algorithm 1, which after initialisation enters the main loop, in which the computation of the positive, negative, and uncertain sets are carried out adaptively until convergence. Before proceeding further, we introduce some notation to describe regular grids, as they are used in the current implementation of the method. Let us consider the hyper-rectangular parameter space <sup>Θ</sup> <sup>=</sup> ×<sup>n</sup> i=1[w<sup>−</sup> <sup>i</sup> , w<sup>+</sup> <sup>i</sup> ] <sup>⊂</sup> <sup>R</sup><sup>n</sup>, where <sup>w</sup><sup>−</sup> i and w<sup>+</sup> <sup>i</sup> are respectively the lower and the upper bound of the domain of the parameter θi. An *h*-grid of Θ is the set *h*-grid = ∪*<sup>m</sup>*∈<sup>M</sup>{*w<sup>−</sup>* + *m* ∗ *h*} where *<sup>h</sup>* <sup>=</sup> {h1,...,hn}, <sup>M</sup> <sup>=</sup> ×<sup>n</sup> <sup>i</sup>=1{0,..., <sup>w</sup><sup>+</sup> *<sup>i</sup>* −w*<sup>−</sup> i* <sup>h</sup>*<sup>i</sup>* }, *w*<sup>−</sup> = (w<sup>−</sup> <sup>1</sup> ,...,w<sup>−</sup> <sup>n</sup> ) and ∗ is the elementwise multiplication. Given a grid, we define as *basic cell* a small hyperrectangle of size *h* whose vertices are points of the grid.

**Initialisation.** The initialisation phase consists in running some simulations of the PCRN at some points of the parameter space, to have a first reconstruction of the satisfaction function. As we do not need to be very precise in every part of the parameter space, but only for points *θ* whose satisfaction probability Pφ(*θ*) is close to the threshold α, we start by simulating the model on all parameters of a coarse grid *h***0**-grid, with *h***<sup>0</sup>** chosen such that the total number of parameters *θ* explored is reasonably small for smMC to be fast. The actual choice will depend on the number of dimensions of the parameter space, as grids depend exponentially on it. Once the grid *h***0**-grid is fixed, we simulate N runs of the model per each point and pass them to a monitoring algorithm for MITL, obtaining N observations of the truth value of the property φ at each point of *h***0**-grid, collected in the set S. We also initialise the sets Pα, Nα, and Uα.

**Computation of** *P<sup>α</sup>* , *N<sup>α</sup>* , **and** *U<sup>α</sup>* **Regions.** The algorithm then enters the main loop, first running smMC with the current set of sample points S to compute the two functions λ<sup>+</sup> and λ−. These are then used to update the regions Pα, Nα, and Uα. Here we discuss several possible approaches.

*Approach 1: Fixed Grid.* The simplest approach is to partition the parameter space in small cells, i.e. using a *h*-grid with *h* small, and then assign each cell to one of the sets. The assignment will be discussed later, but it involves evaluating the functions λ<sup>+</sup> and λ<sup>−</sup> in each point of the grid. The method is accurate if each basic cell contains only a fraction of the volume much smaller than . However, this requires to work with fine grids, whose size blows up quickly with the number of parameters. Practically, this approach is feasible up to dimension 3 or 4 of the parameter space.

*Approach 2: Adaptive Grid.* To scale better with the dimension of the parameter space, we can start evaluating the λ+/<sup>−</sup> functions on a coarse grid, and refine the grid iteratively only for cells that are assigned to the uncertain set, until a minimum grid size is reached.

Central in both approaches is how to guarantee that all points of a basic cell are all belonging to one set, inspecting only a finite number of them. In particular, we will limit the evaluation of the λ+/<sup>−</sup> functions to the vertices of each cell c, i.e. to the points in the grid *h*-grid. Intuitively, this will work if the cell has a small edge size compared to the rate of growth of the satisfaction function, and the values of the satisfaction function in its vertices are all (sufficiently) above or below the threshold. However, we need to precisely quantify this "sufficient". We sketch here two exact methods and an heuristic one, which performs well in practice. We discuss here how to check that a cell belongs to the positive set, the negative one being symmetric.

*Method 1: Global Lipschitz bound.* This approach relies on computing the Lipschitz constant L of the satisfaction function. This can be obtained by estimating its derivatives (e.g. by finite difference or better by learning it using methods discussed in [10]), and performing a global optimization of the modulus of the gradient after each call to smMC. Let d(*h*) be the length of the largest diagonal of a basic cell c in a *h*-grid. Consider the smallest value of the satisfaction function in one of the vertices of c, and call it ˆp. Then the value of the satisfaction function in the cell is surely greater than ˆp − Ld(*h*)/2 (after decreasing for half the diagonal, we need to increase again to reach the value of another vertex). The test then is ˆp − Ld(*h*)/2 ≥ α.

*Method 2: Local Lipschitz bound.* The previous method will suffer if the slope of the satisfaction function is large in some small region, as this will result in a large Lipschitz constant everywhere. To improve it, we can split the parameter space is subregions (for instance, by using a coarse grid), and compute the Lipschitz constant in each subregion. An alternative we are investigating is to compute in each cell of the grid a lower bound of the function f(θ) learned from the GP from its analytic expression.

*Heuristic Method.* In order to speed up computation and avoid computing Lipschitz constants, we can make the function λ<sup>−</sup> more strict. Specifically, we can use a larger β<sup>δ</sup> than the one required by our confidence level δ. For instance for a 95% confidence, β<sup>δ</sup> = 1.96, while we can use instead β<sup>δ</sup> = 3, corresponding roughly to a confidence of 99%. Coupling this with a choice of the grid step *h* at least one order of magnitude smaller than the lenghtscale of the kernel learned from the data (which is proportional to the Lipschitz constant of the kernel and of the satisfaction function), which guarantees that the satisfaction function will vary very little in each cell, we can be confident that if the strict λ<sup>−</sup> is above the threshold in all vertices of the cell, then the same will hold for all points inside c for the less strict λ−.

**Refinement Step.** After having build the sets Pα, Nα, and Uα, we check if the volume of U<sup>α</sup> is below the tolerance threshold. If so, we stop and return these sets. Otherwise, we need to increase the precision of the satisfaction function near the uncertain region. This means essentially reducing the variance inside Uα, which can be obtained by increasing the number of observations in this region. Hence, the refinement step samples points from the undefined regions U, simulates the model few times in each of these points, computes the truth of φ for each trace, and add these points to the training set S of the smoothed model checking process. This refinement will reduce the uncertainty bound in the undefined regions which leads some part of this region to be classified as Positive <sup>P</sup> or Negative <sup>N</sup> . We iterate this process until the exit condition vol(U) vol(Θ) < is satisfied. The convergence of the algorithm is rooted in the properties of smoothed Model Checking, which is guaranteed to converge to the true function with vanishing variance as the number of observation points goes to infinity. In practice, the method converges quite fast, unless the problem is very hard (the true satisfaction function is close to the threshold for a large fraction of the parameter space).

### **4 Results**

**Implementation.** We have implemented our algorithm in Python 3.6. The code is available at http://simonesilvetti.com/pycheck/. To improve the scalability of our algorithm, we profiled it to identify the most computationally expensive steps, among simulating the PCRN, checking the MITL formulae at each step, running smMC and partitioning the state space. The most expensive part in our test turned out to be the simulation step, which we performed using Gillespie SSA algorithm [1]. To speed up simulations, we ran them in parallel leveraging the Numba [29] package of Python which is optimal to execute array-oriented and math-heavy Python code. The smoothed model checking step, instead, is substantially independent with respect the number of repetitions. Its execution time depends on the cardinality of the training points. This is why, compared with [6], we increased the number of simulations per parameter point and reduced their number. We ran all the experiments on a Dell XPS, Intel Core i7-7700HQ 2.8 GHz, 8 GB 1600 MHz memory, equipped with Windows 10 Pro.

**SIR Epidemic Model.** We consider the popular SIR epidemic model [30], which is widely used to simulate the spreading of a disease among a population. The population of N individuals is divided in three classes:


The version of SIR model we consider is defined by the following two chemical reactions:

$$r\_1: S + I \xrightarrow{\alpha\_1} 2I \qquad \alpha\_1 = k\_i \cdot \frac{X\_s \cdot X\_i}{N}$$

$$r\_2: I \xrightarrow{\alpha\_2} R \qquad \alpha\_2 = k\_r \cdot X\_i$$

Here, r<sup>1</sup> describes the possibility that an healthy individual gets the disease and becomes infected and the reaction r<sup>2</sup> models the recovery of an infected agent. We described the model as a PCRN where k<sup>i</sup> ∈ [0.005, 0.3], k<sup>r</sup> ∈ [0.005, 0.2] and initial population (S, I, R) = (95, 5, 0) and we consider the following MITL formula:

$$\phi = (I > 0) \mathcal{U}\_{[100, 120]} \left( I = 0 \right) \tag{3}$$

This formula expresses that the disease becomes extinct (i.e.; I = 0) between 100 and 120 time units. Note that for this model extinction will eventually happen with probability one, but the time of extinction depends on he parameters *θ* = (ki, kr). In the following, we report experiments to synthetise the parameter region such that Pφ(*θ*) > α, with α = 0.1, volume tolerance = 0.1, and confidence δ = 95%. We consider all possible combinations of free parameters to explore (i.e. k<sup>i</sup> alone, k<sup>r</sup> alone, and k<sup>i</sup> and kr). The initial train set of the smoothed model checking approach has been obtained by sampling the truth value on the parameters disposed in a grid as described in Sect. 3, of size 40 points for 1D case and 400 points for the 2D case. The satisfaction probability of each parameter vector which compose the training set, as well as, the parameter vectors sampled by the refinement process have been obtained by simulating the PCRN and evaluating the MITL formula 3 with 1000 repetitions per parameter point.

**Efficiency, Accuracy, and Scalability.** The execution times of the experiments are reported in Table 1 (left). The results shows a good performance of our statistical algorithms, despite being implemented in Python rather then in a more efficient language like C. The execution time (in percentage) with respect to the results of the exact method reported in [7] are 42%, 18% and 7% for Case 1, Case 2 and Case 3. Our results are reported using the heuristic method to compute the sets and a fixed grid of small stepsize *h*.

In Case 1, we also compare the three methods to classify the regions, computing the derivative of the satisfaction probability function by finite differences and (i) optimising it globally to obtain the Lipschitz constant (equal to 4.31), (ii) optimising it in every cell of the fine prediction grid to compute a local Lipschitz constant (in each cell). As for the heuristic method, we use β <sup>δ</sup> = 3 instead of β<sup>δ</sup> = 1.96, and a grid step of order 10−<sup>4</sup>, three orders of magnitude less than the lengthscale of the kernel, set by marginal likelihood optimization equal to 0.1. All three methods gave the same results for the grid size we used. More specifically, the maximum displacement of the approximated satisfaction probability inside the cell is estimated to be 0.003

As a statistical accuracy test, we computed the "true" value of the satisfaction probability (by deep statistical model checking, using 10000 runs) for points in the positive and negative set close to the undefined set, and counted how many times these points were misclassified. More specifically, in Case 1 we consider 300 equally-spaced points between 0.1 and 0.07 (consider that a portion of the undefined region is located in a neighborhood of 0.05, see Fig. 1). All points turned to be classified correctly, pointing out to the accuracy of the smMC prediction.

We performed also a scalability test with respect to the size of the state space of the PCRN model, increasing the initial population size N of the SIR model (case 1). The results are reported in Table 1 (right). We increase the initial population size maintaining the original proportion <sup>I</sup> <sup>S</sup> <sup>=</sup> <sup>1</sup> <sup>19</sup> . Moreover we consider different thresholds α and volume tolerance in order to force the algorithm to execute at least one refinement step, as the shape of the satisfaction function changes with N. The execution time increase moderately, following a linear trend.

**Table 1.** (LEFT) Results for the Statistical Parameter Synthesis for the SIR model with N = 100 individuals and the formula φ = (I > 0) U[100,120] (I = 0). We report the mean and standard deviation of the execution time of the algorithm. The volume tolerance is set to 10% and the threshold α is set to 0.1. The h-grid column shows the size *h* of the grid used to compute the positive, negative, and uncertain sets. (RIGHT) Scalability of the method w.r.t. the size of the state space of the SIR model, increasing initial population N. α and δ are the threshold and volume tolerance used in the experiments.


**Fig. 1.** (a),(c) and (d) show the partition of the parameter space for Cases 1, 2, and 3 respectively. The positive area P<sup>α</sup> is depicted in red, the negative area N<sup>α</sup> is in blue and the undefined region U<sup>α</sup> is in yellow. (a) and (c) are one dimensional case: in the x-axis we report the parameter explored (respectively k<sup>i</sup> and kr), on the y-axis we show the value of the satisfaction function and the confidence bounds (for β<sup>δ</sup> = 3). The green horizontal line is the threshold α = 0.1 (d) shows a two dimensional parameter space, hence no confidence bound has been represented. The circle dot represent the training set. In (b) we have zoomed a portion of the parameter space of (a) to visualize the cells with base length equals to h and height equal to the span of the confidence bounds. (Color figure online)

#### **5 Conclusions**

We presented an efficient statistical algorithm for parameter synthesis, to identify parameters satisfying MITL specifications with a probability greater than a certain threshold. The algorithm is based on Bayesian statistics and leverages the powerful parametric verification framework of Smoothed Model Checking, integrating it into an active learning refinement loop which drives the computational effort of simulations near the critical region concentrated around the threshold α. The developed approach shows good performance in terms of execution time and outperforms the exact algorithm developed in [7], retaining good accuracy at the price of having only statistical guarantees.

Note that we compared with the performance of [7] and not of their GPU implementation [12], as our method uses only CPU computing power at the moment. However, it can be implemented on a GPU, leveraging e.g. [31]. We expect a substantial increase of the performance. Fully distributing on CPU the computations of the algorithm, beyond only stochastic simulation, is also feasible, the hard part being to parallelise GP inference [32].

Other directions for future work include the implementation of the adaptive grid strategy to construct the Pα, Nα, and U<sup>α</sup> regions, given the output of the smMC, and a divide and conquer strategy to split the parameter space (and the uncertain set Uα) in subregions, to reduce the complexity of the smMC. These two extensions are mandatory to scale the method in higher dimensions, up to 6–8 parameters. To scale even further, we plan to integrate techniques to speed up GP reconstruction: more classical sparsity approximation techniques [10] and more recent methods for GPs tailored to work on grids [33,34]. This techniques have a computational cost of O(n) instead of standard implementation which costs O(n<sup>3</sup>). Finally, we aim to combine our approach with the exact algorithm developed in [7]. The idea is to use our approach for a rough exploration of the parameter space to cut out the region with higher statistical confidence to be higher or lower than the considered threshold, applying the exact approach in the remain area, when feasible.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# 7th Competition on Software Verification (SV-COMP)

# **2LS: Memory Safety and Non-termination (Competition Contribution)**

Viktor Mal´ık1,3, Stefan Martiˇ <sup>ˇ</sup> cek1,3, Peter Schrammel1,2(B) , Mandayam Srivas<sup>4</sup>, Tom´aˇs Vojnar<sup>3</sup>, and Johanan Wahlang<sup>4</sup>

> <sup>1</sup> Diffblue Ltd., Oxford, UK <sup>2</sup> University of Sussex, Brighton, UK p.schrammel@sussex.ac.uk

<sup>3</sup> FIT BUT, IT4Innovations Centre of Excellence, Brno, Czech Republic <sup>4</sup> Chennai Mathematical Institute, Chennai, India

**Abstract.** 2LS is a C program analyser built upon the CPROVER infrastructure. 2LS is bit-precise and it can verify and refute program assertions and termination. 2LS implements template-based synthesis techniques, e.g. to find invariants and ranking functions, and incremental loop unwinding techniques to find counterexamples and *k*-induction proofs. New features in this year's version are improved handling of heapallocated data structures using a template domain for shape analysis and two approaches to prove program non-termination.

### **1 Overview**

2LS is a static analysis and verification tool for sequential C programs that is based on an algorithm called *k*I*k*I (*k*-invariants and *k*-induction) [1], which combines bounded model checking, *k*-induction, and abstract interpretation into a single, scalable framework. 2LS relies on incremental SAT solving to employ all these techniques simultaneously in order to find proofs and refutations of assertions, as well as to perform termination analysis [2].

This year's competition version introduces a new *abstract shape domain* allowing 2LS to reason about properties of programs manipulating heap and dynamic data structures, and a *non-termination analysis*, which serves as a counterpart to the existing termination analysis and allows 2LS to prove nontermination of a program.

**Architecture.** 2LS is built upon the CPROVER infrastructure [3] and thus uses *GOTO programs* as the internal program representation. It first performs various static analyses and transformations of the program, including resolution of function pointers, points-to analysis, and insertion of assertions guarding against

The Czech authors were supported by the Czech Science Foundation project 17- 12465S, the IT4IXS: IT4Innovations Excellence in Science project (LQ1602), and the FIT BUT internal project FIT-S-17-4014.

P. Schrammel—Jury member.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 417–421, 2018. https://doi.org/10.1007/978-3-319-89963-3\_24

invalid pointer and memory operations. The analysed program is then translated into an acyclic, over-approximate single static assignment (SSA) form, in which loops are cut at the edges returning to the loop head. Subsequently, 2LS refines this over-approximation by computing inductive invariants in various abstract domains represented by parametrised logical formulae, so-called templates [1]. The competition version uses the interval domain for numerical variables and the new shape domain for pointer-typed variables described below.

The *k*I*k*I algorithm [1] operates on the SSA form, which is translated into a CNF formula over a bitvector representation of program configurations and given to a SAT solver. This formula is incrementally extended and amended to perform loop unwindings and abstract domain operations. The model returned by the solver is then used either to refine the predicates representing abstract values or to find a counterexample refuting the property to be checked. A more detailed description of the 2LS architecture can be found in the tool paper [7].

### **2 New Features**

For SV-COMP'18, apart from various bug fixes and minor improvements, two major improvements of 2LS have been implemented: namely, a support for dealing with inductive list-like data structures and a support for proving program non-termination. Although 2LS supports certain interprocedural analyses, the competition version performs both analyses in a monolithic way, i.e. after inlining function calls. These improvements tackle weaknesses observed in previous years in the heap and memory safety categories, as well as they give a boost to 2LS' capabilities in non-termination analysis.

#### **2.1 Memory Safety and Heap Invariants**

To support shape analysis of dynamic data structures, a new abstract domain has been added to 2LS to express invariants describing heap configurations in the context of the bitvector logic used by 2LS [4]. The domain is based on recording (1) information about

**Fig. 1.** A singly-linked list with nodes allocated at two different program locations.

abstract heap objects pointed to by pointer variables and (2) information about reachability of abstract objects using *pointer access paths* [6]. Here, an abstract heap object represents all objects allocated at a given program location. The access paths then record which target abstract objects can be reached from a given source abstract object while going through some set of intermediary objects. For instance, the list in Fig. 1 would be encoded as list = &*o*<sup>1</sup> <sup>∧</sup> *path*(*o*1*,* nxt*,* {*o*1*, o*2}*,* NULL), meaning that list points to an object *o*<sup>1</sup> and there is a path from *o*<sup>1</sup> via nxt fields of abstract objects *o*<sup>1</sup> and *o*<sup>2</sup> to NULL. This representation is integrated as a template over pointer-typed variables and fields of dynamic objects into *k*I*k*I. The template is a parametrised logical formula. The parameters encode sets of memory objects that can be pointed by each pointer-typed variable as well as the set of paths that can lead from each dynamic object to other objects. 2LS computes these sets using an incremental SAT solver. This allows 2LS to prove or to refute assertions related to manipulation of dynamically linked data structures. The supported properties include null-pointer dereferencing, double-free, or memory leaks, for instance. Assertions for these properties are automatically instrumented into the code.

#### **2.2 Proving Non-termination**

Last year's version of 2LS provided a technique for proving termination based on linear lexicographic ranking functions synthesised using templates over bitvectors [2], but the tool was unable to prove non-termination except for trivial cases. For SV-COMP'18, two techniques for *proving non-termination* have been added [5]. Both of the approaches are relatively simple, yet appear to be reasonably efficient on the SV-COMP benchmarks.

The first approach is based on finding *singleton recurrence sets*. All loops are unfolded *k* times (with *k* being incrementally increased), followed by a check whether there is some loop *L* and a program configuration that can be reached at the head of *L* after both *k* and *k* unwindings for some *k < k*. Such a check can be easily formulated in 2LS as a formula over the SSA representation of programs with loops unfolded *k* times. This technique is able to find lassoshaped executions in which a loop returns to the same program configuration every *k* − *k* iterations after *k* initial iterations.

The second approach tries to reduce the number of unwindings by looking for loops that generate an *arithmetic progression* over every integer variable. More precisely, it looks for loops *L* for which each integer variable *x* can be associated with a constant *c<sup>x</sup>* such that every iteration of *L* changes the value of *x* to *x* + *cx*, keeping non-integer variables unchanged. Two queries are used to detect such loops: the first one asks whether there is a configuration *x* and a constant vector *c* (with the vectors ranging over all integer variables modified in the loop and constants from their associated bitvector domains) such that one iteration of *L* ends in the configuration *x* + *c*, while the second makes sure that there is no configuration *x* over which one iteration of *L* would terminate in a configuration other than *x* + *c*. If such a loop *L* and a constant vector *c* are found, non-termination of *L* can be proved as follows: First, we gradually exclude each configuration *x* reachable at the head of *L* for which there is some *k* such that *L* cannot be executed from *x*+*k.c* (intuitively meaning that *L* cannot be executed *k* + 1 times from *x*). Second, we check whether there remains some non-excluded configuration reachable at the head of *L*.

The termination and non-termination analyses are run in parallel, and the first definite answer is used. Among the new non-termination analyses, several rounds of unwinding are first tried with the singleton recurrence set approach. If that is not sufficient, the arithmetic progression approach is tried. If that does not succeed either, further rounds of unwinding with the former approach are run.

### **3 Strengths and Weaknesses**

2LS' core algorithm, *k*I*k*I, is designed to be efficient for simultaneously finding proofs as well as refutations. Our SSA encoding allows us to introduce abstractions only at certain program points where these are necessary to infer the predicates required to construct proofs (e.g. invariants, ranking functions, recurrence sets). The remaining program is represented in a bit-precise large-block encoding.

Compared to the previous editions of the competition, 2LS is now able to reason about dynamic linked data structures. The approach used is currently able to handle various forms of linked lists (singly- or doubly-linked, a subset of nested or circular lists). However, more elaborate template domains will be required to handle other dynamic data structures such as trees and more general graph structures.

2LS' template-based approach to abstract interpretation allows easy combination of domains. We combine the heap domain with intervals over bitvectors, which is sufficient for many benchmarks. However, some benchmarks, e.g. those requiring reasoning about arrays contents, demand stronger invariants than we are currently able to infer.

The termination analysis scales well, but is currently limited to rather simple termination conditions (lexicographic linear). The newly implemented nontermination analyses are surprisingly effective on many SV-COMP termination benchmarks (638 out of 657 non-termination benchmarks proved). However, if a larger number of unwindings is needed the approach becomes quite inefficient. *k*I*k*I does not yet support recursion, which is another limitation, in particular w.r.t. the SV-COMP termination benchmark set, which contains a large number of recursive programs. The output of witnesses in the new categories (memory safety and termination) is still lacking (more than 550 points have been lost there).

### **4 Tool Setup**

The competition submission is based on 2LS version 0.6.<sup>1</sup> Installation instructions are given in the file COMPILING. The executable 2ls is in the directory src/2ls. See the 2ls wrapper script (contained in the tarball) for the relevant command line options given to 2LS. The BenchExec script is called two ls.py and the benchmark definition file 2ls.xml. As a back end, the competition submission of 2LS uses Glucose 4.0. 2LS competes in all categories except Concurrency.

### **5 Software Project**

2LS is maintained by Peter Schrammel with pull requests contributed by the community. It is publicly available under a BSD-style license. The source code is available at http://www.github.com/diffblue/2ls.

<sup>1</sup> Executable available at https://gitlab.com/sosy-lab/sv-comp/archives/tags/svco mp18.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **YOGAR-CBMC: CBMC with Scheduling Constraint Based Abstraction Refinement (Competition Contribution)**

Liangze Yin<sup>1</sup> , Wei Dong1(B) , Wanwei Liu<sup>1</sup> , Yunchou Li<sup>2</sup>, and Ji Wang<sup>1</sup>

> <sup>1</sup> National University of Defense Technology, Changsha, China yinliangze@163.com, wdong@nudt.edu.cn

<sup>2</sup> Beijing Institution of Tracking and Telecommunication Technology, Beijing, China

**Abstract.** This paper presents the Yogar-CBMC tool for verification of multi-threaded C programs. It employs a scheduling constraint based abstraction refinement method for bounded model checking of concurrent programs. To obtain effective refinement constraints, we have proposed the notion of *Event Order Graph (EOG)*, and have devised two graph-based algorithms over EOG for counterexample validation and refinement generation. The experiments in SV-COMP 2017 show the promising results of our tool.

### **1 Verification Approach and Software Architecture**

Bounded model checking (BMC) is among the most efficient techniques for concurrent program verification [1]. However, due to non-deterministic interleavings, a huge encoding is required for an exact description of the thread interaction.

Yogar-CBMC is a verification tool for multi-threaded C programs based on shared variables under *sequential consistency (SC)*. For these programs, we have observed that the *scheduling constraint*, which defines that "for any pair w, r s.t. r reads the value of a variable v written by w, there should be no other write of v between them", significantly contributes to the complexity of the behavior encoding. In the existing work of BMC, the scheduling constraint is encoded into a complicated logic formula, the size of which is cubic in the number of shared memory accesses [2].

To avoid the huge encoding of scheduling constraint, Yogar-CBMC performs abstraction refinement by weakening and strengthening the scheduling constraint [3]. Figure 1 demonstrates the high-level overview of its architecture. We initially ignore the scheduling constraint and then obtain an overapproximation abstraction ϕ<sup>0</sup> of the original program (w.r.t. the given loop

This work was supported by the National key R&D program of China (No. 2017YFB1001802); the 973 National Program on Key Basic Research Project of China (No. 2014CB340703); and the National Nature Science Foundation of China (No. 61690203, No. 61532007).

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 422–426, 2018. https://doi.org/10.1007/978-3-319-89963-3\_25

unwinding depth). If the property is safe on the abstraction, then it also holds on the original bounded program. Otherwise, an abstraction counterexample is obtained and the abstraction will be refined if the counterexample is infeasible.

**Fig. 1.** High-level overview of Yogar-CBMC architecture.

The performance of this method significantly depends on the generated refinement constraints. Ideally, a refinement constraint should have a small size yet a large amount of space should be reduced during each iteration. To achieve this goal, we have proposed the notion of *Event Order Graph (EOG)*, and have devised two graph-based algorithms over EOG for counterexample validation and refinement generation. Given an abstraction counterexample π, the corresponding EOG G<sup>π</sup> captures all the event order requirements of π defined in the scheduling constraint. The counterexample π is feasible iff the EOG G<sup>π</sup> is feasible. To validate the feasibility of Gπ, we have proposed several deduction rules to deduce those implicit order requirements of Gπ. If any cycle exists in Gπ, then both π and G<sup>π</sup> are infeasible. A graph-based refinement algorithm is then employed to analyze all the possible "kernel reasons" of all cycles. By eliminating those "redundant" kernel reasons, we can usually obtain a small set of "core kernel reasons", which can usually be encoded into a small refinement constraint. The experimental results show that: (1) Our graph-based EOG validation method is powerful enough in practice. Given an infeasible EOG, it can usually identify the infeasibility with rare exceptions. (2) Our graph-based refinement method is effective. If some cycle exists in Gπ, it can usually obtain a small refinement constraint which reduces a large amount of search space.

If no cycle exists in Gπ, we are not sure whether the EOG is feasible or not. We employ a constraint-based EOG validation process to further validate its feasibility by constraint solving. If an infeasibility is determined, a constraintbased refinement generation process is performed to refine the abstraction, which obtains only one kernel reason of the infeasibility. Enhanced by these two constraint-based processes, we have proved that our method is sound and complete w.r.t the given loop unwinding depth.

Consider the example shown in Fig. 2. We attempt to verify that it is impossible for both m and n to be 1 after the exit of threads thr1 and thr2, which has a modular proof in this program. In this example, we have observed that:


**Fig. 2.** An illustration example.

small sizes yet reduce large amount of the search space, and our graph-based refinement method is effective.

### **2 Strengths and Weaknesses**

The strengths of our tool include: (1) Our approach is a general purpose technique for multi-threaded C program verification, not assuming any special characteristics of the programs. Our tool supports nearly all features of C and PThreads. (2) Our approach is efficient in practice. Without the scheduling constraint, the size of the encoding can be dramatically reduced. Moreover, it can usually verify the property with a small number of refinements, while the refinement constraints usually have small sizes. (3) Enhanced by the constraint-based counterexample validation and refinement generation processes, our approach is sound and complete w.r.t. the given loop unwinding depth. It provides both proofs and refutations for the property. If the property is found to be false, a counterexample will be provided. (4) As the abstractions usually have small sizes, our tool generally consumes less memory than those tools giving an exact description of the scheduling constraint. In this sense, our tool is more scalable.

We have applied Yogar-CBMC to the benchmarks in the concurrency track of SV-COMP 2017. Our tool has successfully verified all these examples within 1550 s and 43 GB of memory. It has won the gold medal in the Concurrency Safety category of SV-COMP 2017 [4].

However, for those programs where the scheduling constraint is not the major part of the encoding, our method may still need dozens of refinements. Given that the abstractions may have similar size with the monolithic encoding, our tool may run worse than those monolithic encoding tools. Moreover, for those real-world programs with a large number of read/write accesses and complex data structures, how to reduce the number of refinements and how to deal with the shared structure members more efficiently, are still challenging problems.

### **3 Tool Setup and Configuration**

The binary file of Yogar-CBMC for Ubuntu 16.04 (x86 64-linux) is available at https://gitlab.com/sosy-lab/sv-comp/archives. It is implemented on top of CBMC-4.9<sup>1</sup>. Its setup and configuration are same as that of CBMC. The toolinfo module and benchmark definition of our tool is "yogar-cbmc.py" and "yogarcbmc.xml" respectively.

Our tool needs two parameters of CBMC: --no-unwinding-assertions and --32. The unwind bound of Yogar-CBMC is dynamically determined through a syntax analysis. Particularly, the bound is set to 2 for programs with arrays, and n if some of the program's for loops are upper bounded by a constant n, which is the same as for MU-CSeq [5]. To run Yogar-CBMC for a program file, just use the following command:

```
./yogar-cbmc --no-unwinding-assertions --32 -
                                              file
```
*Participation/Opt Out.* Yogar-CBMC competes only in the *concurrency category*.

### **4 Software Project and Contributors**

Yogar-CBMC is developed at HPCL, School of Computers, National University of Defense Technology, and includes contributions by the authors of this paper. Its source code is available at https://github.com/yinliangze/yogar-cbmc. For more information, contact Liangze Yin.

### **References**


<sup>1</sup> Download from https://github.com/diffblue/cbmc/releases on Nov 20, 2015.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **CPA-BAM-Slicing: Block-Abstraction Memoization and Slicing with Region-Based Dependency Analysis (Competition Contribution)**

Pavel Andrianov, Vadim Mutilin, Mikhail Mandrykin(B) , and Anton Vasilyev

Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia *{*andrianov,mutilin,mandrykin,vasilyev*}*@ispras.ru

**Abstract.** Our submission to SV-COMP'18 is a composite tool based on software verification framework CPAchecker and static analysis platform Frama-C. The base verifier uses a combination of predicate and explicit value analysis with block-abstraction memoization as the CPA-BAM-BnB tool presented at SV-COMP'17. In this submission we augment the verifier on reachability verification tasks with a slicer that is able to remove those statements that are irrelevant to the reachability of error locations in the analysed program. The slicer is based on contextsensitive flow-insensitive separation analysis with typed polymorphic regions and simple dependency analysis with transitive closures. The resulting analysis preserves reachability modulo possible non-termination while removing enough irrelevant code to achieve considerable speedup of the main analysis. The slicer is implemented as a Frama-C plugin.

### **1 Verification Approach**

The submission presents a composite setting comprised of a mature static verification tool CPAchecker [1] and an experimental reachability slicer (a Frama-C [2] plugin) intended to speed up verification by pruning the verification scope prior the application of the main analysis. By verification scope we understand the code to be analyzed rather than the search space explored by the main analysis since the slicer doesn't prune the search space as it is, but rather removes statements (including function calls) that can be proved to not influence the verification outcome. The slicer included in this submission is currently only applicable to reachability verification tasks, though the underline algorithm is not generally limited to reachability of a small number of error locations and so can be potentially extended to support e.g. memory safety properties.

The slicer is based on a relatively simple mark-and-sweep algorithm, where the relevant statements are first identified by computing transitive closure of

M. Mandrykin—Jury member.

The research was supported by RFBR grant 18-01-00426.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 427–431, 2018. https://doi.org/10.1007/978-3-319-89963-3\_26

the dependency relation, then marked, and finally the remaining statements are removed to produce a sliced verification task. The mark-and-sweep slicing is performed on top of preliminary region analysis, which allows to handle abstract memory locations ascribed to the corresponding disjoint memory regions essentially similar to usual unaliased program variables.

The region analysis implemented in the current submission is a conservative over-approximation of context-sensitive flow-insensitive separation analysis with polymorphic regions for deductive verification. It was first described in [3] and later substantially extended in [4]. The conservative approximation is needed because the original analysis generally requires user annotations. The over-approximation is expressed in the form of additional dependencies introduced on the marking stage rather than in the region analysis itself. The dependencies allow to approximate reinterpretations of memory regions (corresponding to the use of unions and arbitrary pointer type casts), but not some corner cases of pointer arithmetic (mostly arithmetic dependent on a particular layout of structure fields), so the resulting analysis remains unsound in the general case. However, the results of analysis benchmarking using CPAchecker as reachability verifier on the tasks in SV-COMP SoftwareSystems category showed no cases of unsoundness caused by the region analysis. This may be explained by the fact that most of the cases where the analysis is unsound with respect to a low-level C memory model are also regarded as undefined behavior by the C standard, so are probably quite rarely used in practice.

### **2 Software Architecture**

The main CPAchecker verification framework is included in the submission without any considerable changes. The combined tool is implemented as a wrapper script that encapsulates the main verifier invocation and does the following:


The slicer (named Crude slicer) is implemented as a plugin to Frama-C [2], an extensible platform for source-code analysis of C software. The plugin implementation does not interact with other Frama-C plugins and only makes use of the Frama-C kernel. The plugin also uses OCamlgraph [5] library. Both the Frama-C platform and the Crude slicer plugin are implemented in OCaml.

The witness post-processing stage currently simply removes the character offsets from the resulting witness (the line numbers are preserved using line directives supported by CPAchecker) and substitutes checksum of the original program source.

Since the SoftwareSystems category of the competition also contains memory safety (and overflow) verification tasks, the submission also includes memory safety configuration smg-ldv based on shape analysis presented in [6].

### **3 Evaluation of the Approach**

The slicer is currently able to handle only reachability verification tasks. It was evaluated on 2734 tasks from the Systems DeviceDriversLinux64 ReachSafety subcategory of the SV-COMP'18 benchmarks on Intel Xeon E3- 1230 v5 (3.4 GHz) machines in the competition setting. The submitted configuration with slicing was compared to baseline CPA-BAM-BnB [7,8] configuration (-ldv-bam-svcomp) without slicing that was also submitted to this year's competition. The results are presented in the following table:


The table presents the results for correct verdicts only and does not take witness checking into account.

There are two significant limitations of the approach. First, the slicing is performed under assumption that all possible execution paths in the verified program are finite. This does not lead to unsoundness, since reachability (as a safety property) can be assumed to be violated only on finite paths. However, there is 3 wrong FALSE verdicts reported on the benchmarks where an error location is spuriously reached after passing through an infinite loop removed by the slicer. Another limitation is that the resulting tool can not produce precise witnesses both due to imprecision in source code locations and (more importantly) due to unavailability of either invariants or error paths in the sliced out parts of the code. The caused 1090 TRUE verdicts and all FALSE verdicts to fail to be confirmed by the witness checkers on the competition.

The time required for slicing varies from 0.08 to 1905.47 s with an average of 14.82 s. So in the submission the slicer is run with a timeout of 400 s and the remaining tasks (17 out of 2734 in the evaluation) are passed to the main verifier without slicing.

### **4 Tool Setup and Configuration**

The submission is available for download as a ZIP archive named cpa-bamslicing.zip from the SV-COMP repository by following URL: https://gitlab. com/sosy-lab/sv-comp/archives/tree/master/2018. The submission includes CPAchecker version 1.6.1 and a statically linked version of Frama-C Sulfur-20171101-beta with Crude slicer plugin. The version of the plugin corresponds to commit fcd3b927. CPAchecker requires Java 8 runtime environment. The invocation of the slicer is embedded in the CPAchecker wrapper script, so the whole tool has to be executed with the following command line:

```
scripts/cpa.sh -ldv-bam-svcomp -disable-java-assertions
            -heap 10000m -spec prop.prp program.c
```
The tool participates in SoftwareSystems category, the corresponding benchmark definition is cpa-bam-slicing.xml.

**Acknowledgements.** The CPAchecker project is open-source and developed by an international research group from Ludwig-Maximilian University of Munich, University of Passau, Ivannikov Institute for System Programming of the Russian Academy of Sciences and several other universities and institutions. More information about the project can be accessed at https://cpachecker.sosy-lab.org. The slicer is developed as part of the Linux Driver Verification project [9] (http://linuxtesting.org/ldv), the slicer project page is https://forge.ispras.ru/projects/crude slicer. Both the CPAchecker tool and the Crude slicer plugin are distributed under the terms of the Apache License, Version 2.0. The Frama-C platform (http://frama-c.com/) is co-developed at two French public institutions: CEA LIST and INRIA Saclay – ˆIle-de-France, and licensed under GNU LGPL v2. We thank all contributors of the projects for their work.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **InterpChecker: Reducing State Space via Interpolations (Competition Contribution)**

Zhao Duan<sup>1</sup>, Cong Tian1(B), Zhenhua Duan<sup>1</sup>, and C.-H. Luke Ong<sup>2</sup>

<sup>1</sup> ICTT and ISN Lab, Xidian University, Xi'an 710071, People's Republic of China ctian@mail.xidian.edu.cn

<sup>2</sup> Department of Computer Science, University of Oxford, Oxford, UK

**Abstract.** InterpChecker is a tool for verifying safety properties of C programs. It reduces the state space of programs throughout the verification via two new kinds of interpolations and associated optimization strategies. The implementation builds on the open-source, configurable software verification tool, CPAChecker.

### **1 Verification Approach**

Our approach to scalable CEGAR-based model checking is to exploit Craig interpolation [3] to learn abstractions that can systematically reduce the program state space which must be explored for a given safety verification problem. In addition to the interpolants for parsimonious abstraction [4] (called *reachability interpolants* (*R-Interp*) here for clarity), we introduce two new kinds of interpolants, called *universal safety interpolants* and *existential error interpolants*.


The *S-Interp* at a location of a control flow graph (CFG) collects predicates that are relevant to a yes-instance of the safety verification, so that whenever the *S-Interp* is implied by the current path, all paths emanating from this location are guaranteed to be safe. Dually, whenever the *E-Interp* at a location of a CFG is implied by the current path, there is an unsafe branch from it, and so, one can immediately conclude that the program is unsafe. We learn *S-Interp* and

This research is supported by the NSFC grant No. 61420106004, 61732013, and 61751207. The work was done partially while Duan and Ong were visiting the Institute for Mathematical Sciences, National University of Singapore in 2016. The visit was partially supported by the Institute.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 432–436, 2018. https://doi.org/10.1007/978-3-319-89963-3\_27

*E-Interp* from spurious error traces and apply them to reduce the state space of programs throughout the CEGAR-based program verification process. For convenience, we denote a CFG as a tuple G = (L, T, l0, f), where L is the set of program locations, l<sup>0</sup> ∈ L is the initial location, f ∈ L is the final location, T ⊆ L × Ops × L is the transition relation, and *Ops* is the set of instructions.

When verifying a programs, we first unwind the CFG to generate an Abstract Reachability Tree (ART). An ART A = (S*A*, E*A*), obtained from a CFG G = (L, T, l0, f), consists of a set S*<sup>A</sup>* of abstract states and a set E*<sup>A</sup>* of edges. An *abstract state* s ∈ S*<sup>A</sup>* is a triple s = (l, c, p) where l is a location in the CFG, c is the current call stack, and p is an abstract predicate indicating the reachable region of the current state which is determined by the reachable interpolant, *R-Interp*. Given two states s and s , we say s is *covered* by s just if s[0] = s [0], s[1] = s [1], and s[2] → s [2]. (Notation: for a tuple e, we write e[i] for the i-th component of e.) Further, if s is covered by s and all the future of s (i.e. all abstract states reachable from s ) has been explored, then it is safe to not explore the future of s. A branch (path) Π of an ART, denoting a possible execution of the program, is a finite alternating sequence of states and edges, Π = s0, e0, ··· , e*<sup>n</sup>*−1, s*n*, such that for all 0 ≤ i<n, e*i*[0] = s*<sup>i</sup>* and e*i*[2] = s*i*+1. Given a path Π of an ART, we write *P<sup>f</sup>* (Π) for the path formula ssa(e0[1]) ∧ ···∧ssa(e*<sup>n</sup>*−<sup>1</sup>[1]) obtained from Π. Here ssa(*op*) is the static single assignment (SSA) of an operation op where every variable occurring in Π is assigned a value at most once.

Given a CFG whose locations are enriched with default values of *R-Interp*, *S-Interp*, and *E-Interp*, we construct the ART for exploring a real counterexample by starting from the root, i.e. s<sup>0</sup> : (l0, −,*true*). The flowchart in Fig. 1 gives a bird's eye view of our approach to safety verification with reachability, safety and error interpolations. When a state s : (l, c, p) is being explored and l is not an error location:

	- p = *false*;
	- p = *false*, F(l) = f, and P*<sup>f</sup>* (s0, ··· , s) → I*s*(l);
	- p = *false* and s is covered by a visited state s .

When l of the current state s : (l, c, p) is an error location, we first check whether the current path Π = s0, ··· , s is spurious. If Π is not spurious, we conclude that the program is unsafe. Otherwise, by *update S-Interp*, *update E-Interp*, and *update R-Interp* [5], the *S-Interp*, *E-Interp*, and *R-Interp* of locations involved in Π are updated, respectively. Subsequently, we reversely track the current path for other possibilities and treat a new current state s : (l, c, p) in the same way until the program is reported as unsafe or there are no more states to be explored.

To maximise the effect of the proposed interpolations, we also present two kinds of optimizing strategies: *pruning CFG* and *weight-guided search*. In real-world

**Fig. 1.** Interpolation aided CEGAR approach for program verification

programs, there may exist some locations in a CFG which can never reach any error location. To avoid exploring these locations when verifying the program, the first strategy is to prune the CFG by removing these locations and the relative control flow edges in advance. A safety interpolant works only when it is full. Hence, the earlier full safety-interpolants are formed, the more effective the performance will be. To form full interpolants, the second strategy is to explore one side of a branch as early as possible if the other side has been explored. The goal is achieved by introducing an attribute *weight* to transitions of a CFG. Throughout the verification, the branch with the largest weight will be explored first.

### **2 Software Architecture**

Our implementation of InterpChecker builds on the open-source, configurable software verification tool, CPAChecker [1]. Like CPAChecker, InterpChecker can verify safety properties of C program via reachability checking of the instrumented error labels. All extra functions are implemented in Java, using the existing libraries provided by CPAChecker. In Fig. 1, the white parts are new, while the grey parts are original CPAChecker functions. We set up the InterpChecker interpolants and optimizations as an option of CPAChecker, organised as a refinement-selection configuration, in the sense of [2].

### **3 Strengths and Weakness**

The new interpolants implemented in InterpChecker do not affect the existing configurations of CPAChecker. InterpChecker supports the verification of safety properties of C program via reachability checking of the instrumented error labels. The power of InterpChecker is best illustrated when analysing large-scale programs because it can avoid exploring more paths. The current version does not support the verification of the properties written as temporal logic formulas. Like CPAChecker, we skip recursive functions and treat them as pure functions. Thus, false negatives may occur for programs with recursive functions.

### **4 Tool Setup and Configuration**

A zipped file containing InterpChecker 1.0 is available at http://github.com/ duanzhao-dz/interpchecker. It contains all the required libraries: no installation of external tools is required. To run InterpChecker, first download the code from the website, then run the following command to install the package in Ubuntu 16.04: sudo apt-get install openjdk-8-jdk.

To process a benchmark example test.c, invoke the script by the following command: ./scripts/cpa.sh -sv-comp18-interpcpachecker test.c. The output of InterpChecker is written to the file output/Statistics.txt. When using BenchExec, the output can be translated by the interpchecker.py tool-info module. The categories verified by the competition candidate are listed in the file interpchecker.xml. The two files are contained in the zipped file. If the checked property does not hold, a human readable counterexample is written to output/ErrorPath.txt and an error witness is written to the zipped file witness.graphml.gz. Note that Java Runtime Environment is required, which should be at least Java 8 compatible.

### **5 Software Project and Contributors**

Based on the open source tool CPAChecker, InterpChecker is developed by Xidian University, China, and the University of Oxford, UK. We thank Dirk Beyer and his team for their original contributions to CPAChecker.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Map2Check Using LLVM and KLEE (Competition Contribution)**

Rafael Menezes<sup>1</sup>, Herbert Rocha1(B) , Lucas Cordeiro<sup>2</sup> , and Raimundo Barreto<sup>3</sup>

<sup>1</sup> Department of Computer Science, Federal University of Roraima, Boa Vista, Brazil herberthb12@gmail.com

<sup>2</sup> Department of Computer Science, University of Oxford, Oxford, UK

<sup>3</sup> Institute of Computing, Federal University of Amazonas, Manaus, Brazil

**Abstract.** Map2Check is a bug hunting tool that automatically checks safety properties in C programs. It tracks memory pointers and variable assignments to check user-specified assertions, overflow, and pointer safety. Here, we extend Map2Check to: (i) simplify the program using Clang/LLVM; (ii) perform a path-based symbolic execution using the KLEE tool; and (iii) transform and instrument the code using the LLVM dynamic information flow. The SVCOMP'18 results show that Map2Check can be effective in generating and checking test cases related to memory management of C programs.

### **1 Overview**

Map2Check v7*.*1 uses source code instrumentation based on dynamic information flow, to monitor data from different program executions. Map2Check automatically produces concrete inputs to the program via symbolic execution, in order to execute different program paths and to detect failures related to arithmetic overflow, invalid deallocation, invalid pointers, and memory leaks. Map2Check uses Clang [5] as a front-end, which supports the main C standard, e.g., C99 according to the standard ISO/IEC 9899:1990. In its previous version [7], Map2Check was able to automatically generate test cases to check memory management using bounded model checkers (e.g., ESBMC [4]). The main original contributions of Map2Check v7*.*1 are: (i) added Clang [5] as a front-end to improve the symbolic execution of C programs; (ii) adopted the LLVM [6] framework as a code transformation engine; and (iii) integrated the KLEE [1] tool as a symbolic execution engine to automatically explore different program paths.

### **2 Verification Approach**

The Map2Check tool is inspired by LEAKPOINT [3] and Symbiotic 4 [2], which use compiler techniques to analyze C programs using code instrumentation. The

H. Rocha—Jury member.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 437–441, 2018. https://doi.org/10.1007/978-3-319-89963-3\_28

main novelty of Map2Check v7*.*1 is the integration of the LLVM Intermediate Representation (IR) to analyze and verify C programs. This LLVM IR is based on the static single assignment representation and provides type safety, lowlevel operations, and the capability of representing high-level languages. If we compare Map2Check to other related tools, e.g., Symbiotic 4, it does not perform static program slicing and does not use the symbolic execution of KLEE to directly explore the program state space. Map2Check applies source code instrumentation to monitor and gather areas of data memory from different concrete program executions; this code instrumentation focuses on exploring dynamic information flow to avoid the need for an approximate static analysis. Similarly to LEAKPOINT, Map2Check taints program data (e.g., variables or memory locations) with a taint mark metadata and then propagates the taint marks over the concrete program executions. Fig. 1 shows an overview of the Map2Check verification flow. The tool input is a C program and a safety property (e.g., overflow and pointer safety); it returns *TRUE* (if there is no path that violates the safety property), *FALSE* (if there exists a path that violates the safety property), or *UNKNOWN* otherwise.

**Fig. 1.** Map2Check verification flow.

The Map2Check verification flow has the following main steps: (A) convert the C code into the LLVM IR using Clang [5]; (B) apply specific code optimizations, e.g., dead code elimination and constant propagation; (C) add Map2Check library functions to track pointers, and add assertions into the LLVM bitcode; (D) connect the code instrumented by Map2Check to support the execution of its functions; (E) apply further Clang optimizations to improve the symbolic execution (e.g., canonicalize natural loops and promote memory to register); (F) generate concrete inputs for the Map2Check instrumented functions by performing symbolic execution of the analyzed code in LLVM IR using KLEE; and (G) generate witnesses: if a safety property is violated, then a "violation witness" is produced using the KLEE output to trace the error location; if there is no path that violates the safety property, then a "correctness witness" is produced, which identifies each basic block executed in the control flow graph of the LLVM IR using the concrete inputs produced by KLEE (LLVM syntactically enforces some of those basic blocks as invariants from its assignments).

Map2Check v7*.*1 tracks important data of the analyzed C code to identify functions and operations over pointers. Then, it checks the respective assertions via symbolic execution, which produces inputs to concretely execute the program. In particular, Map2Check tracks the heap memory used by the analyzed code using the following data log lists: **Heap log** tracks the allocated memory address (i.e., arguments of functions, functions, and variables) and its memory size in the heap memory; **Malloc log** tracks the addresses that are dynamically allocated/deallocated, their size and pointer actions (allocation and deallocation), executed at the current program location; and **List log** stores data about operations over pointers, e.g., the code line number for each operation, program scope, variable name, memory addresses, and addresses pointed to by program variables.

Map2Check v7*.*1 implements a function map2check non det x with x in the supported C data types (e.g., char, int, and float), which is interpreted by KLEE to model non-deterministic values. In this respect, Map2Check v7*.*1 differs from its previous version, which implements for non-deterministic values, a function that returns a random number based on a probabilistic distribution. To check the unreachability of an error location, Map2Check identifies a given target function (e.g., VERIFIER error) and then replaces that by an error assertion, where the target function is called. To check overflow, Map2Check adds an assertion before all arithmetic instructions over integers to analyze the results over the signed operations and the maximum and minimum integer values. To check pointer safety, Map2Check checks whether a given address to be deallocated is tracked in the Malloc log list and then identifies whether the deallocation of memory was already performed for that program location (invalid deallocation); Map2Check also identifies whether allocated memory was not released at the end of the program execution (memory leak); Additionally, Map2Check analyzes the memory addresses in the Malloc log and Heap log lists to identify if those addresses point to a valid address (invalid pointer). Map2Check does not distinguish between the usual "valid-memtrack" and "valid-memclean" properties in SV-COMP.

### **3 Proposed Architecture**

Map2Check v7*.*1 is implemented as a source-to-source transformation tool in C/C++ using LLVM (v3*.*8*.*1). It uses Clang (v3*.*8*.*1) as a front-end to parse a C program and to generate the respective LLVM bitcode to be used in the code transformation to track pointers to areas of memory and variable assignments (Fig. 2). It uses KLEE (v1*.*2*.*0) as a path-based symbolic execution engine; STP<sup>1</sup> (v2*.*1*.*2) is used as the SMT solver by KLEE to check constraints over bit-vectors and arrays. The Boost<sup>2</sup> C++ library is used as a helper library,

<sup>1</sup> http://stp.github.io.

<sup>2</sup> http://www.boost.org.

**Fig. 2.** Map2Check architecture flow.

e.g., to generate the witness in the GraphML format. Map2Check participates in SVCOMP'18 (as in the map2check.xml benchmark definition) in the following categories: ReachSafety-Arrays, ReachSafety-BitVectors, ReachSafety-Heap, ReachSafety-Loops, ReachSafety-Recursive, MemSafety, and NoOverflows.

### **3.1 Availability and Installation**

Map2Check v7*.*1 (for 64-bit Linux) is available<sup>3</sup> under the GPL license. The Clang, LLVM, KLEE, and STP tools are included in the Map2Check distribution. Map2Check is invoked via a command-line (as in the map2check.py module for BenchExec) as:

### ./map2check-wrapper.py -p propertyFile.prp file.i

Map2Check accepts the property file and the verification task and provides as result: *TRUE + Witness, FALSE + Witness, or UNKNOWN*. For each errorpath or correctness witness, a file (called witness.graphml) with the witness proof is generated in the Map2Check root-path folder.

### **4 Strengths and Weaknesses of the Approach**

Map2Check exploits dynamic information flow by tainting program data. It uses Clang/LLVM as an industrial-strength compiler to simplify and instrument the code; and also employs KLEE to produce concrete inputs for different program executions. The integration between LLVM and KLEE opens up several possibilities to implement new testing and verification techniques in Map2Check. Particularly, we intend to improve our symbolic execution by synthesizing inductive invariants to prove properties of loops and recursive programs and also to prune the search-space, given that Map2Check bounds the loops and recursion up to a given depth *k*. The SVCOMP'18 results show that Map2Check can be effective in generating and checking test cases of memory management for C programs. Map2Check achieved a score of 228 in the MemSafety category with no single

<sup>3</sup> https://github.com/hbgit/Map2Check/archive/map2check v7.1 svcomp18d.zip.

incorrect result; in particular, Map2Check produced the highest score (i.e., 106) in the MemSafety-Arrays subcategory. In the NoOverflows category, Map2Check achieved a score of *−*263; some incorrect results are due to our imprecise overflow check. In the ReachSafety category, we noted that Map2Check claims 312 correct results; however, it reported 16 incorrect true and 1 incorrect false. Some of these incorrect results are related to Map2Check limitation to handle loops and recursion.

**Acknowledgments.** We thank C. Cadar, D. Poetzl, and the anonymous reviewers for their comments, which helped us to improve the draft version of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Symbiotic 5: Boosted Instrumentation (Competition Contribution)**

Marek Chalupa(B) , Martina Vitovsk´a, and Jan Strejˇcek

> Masaryk University, Brno, Czech Republic xchalup4@fi.muni.cz

**Abstract.** The fifth version of Symbiotic significantly improves instrumentation capabilities that the tool uses to participate in the category *MemSafety*. It leverages an extended pointer analysis re-designed for instrumenting programs with memory safety errors, and staged instrumentation reducing the number of inserted function calls that track or check the memory state. Apart from various bugfixes, we have ported Symbiotic (including the external symbolic executor Klee) to llvm 3.9 and improved the generation of violation witnesses by providing values of some variables.

#### **1 Verification Approach**

The basic approach of Symbiotic remains unchanged [7]: it uses instrumentation to reduce checking of specific properties (e.g. *no-overflow* or *memory safety*) to checking reachability of error locations. Then we apply slicing which removes the code that has no influence on reachability of these locations. Finally, we symbolically execute the sliced code using Klee [1] to refute or confirm that an error location is reachable.

For many years, our attention has been focused mainly on slicing [2,6,8]. Only in 2016, we implemented a configurable instrumentation that enabled Symbiotic to check memory safety or, in general, any safety property. Consequently, Symbiotic 4 [4] participated for the first time in the category *MemSafety* where it won the bronze medal.

The instrumentation used in Symbiotic 4 to check memory safety inserts calls to functions that *track* every block of allocated memory and calls to functions that *check* validity of dereferences using the tracked information. A check is not inserted if a static pointer analysis guarantees that the dereferenced pointer points to a memory block that was allocated before. Later we have recognized a flaw of this optimization: a standard pointer analysis ignores memory deallocations and, hence, it can tell that a pointer can point to memory blocks allocated by specific program lines, but it does not tell whether these memory blocks are

The research is supported by the Czech Science Foundation grant GBP202/12/G061. M. Chalupa—Jury member.

c The Author(s) 2018

<sup>-</sup>D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 442–446, 2018. https://doi.org/10.1007/978-3-319-89963-3\_29

**Fig. 1.** Quantile plot of running times of the three considered configurations of Symbiotic 5. On the x-axis are the benchmarks sorted according to the corresponding running times and on the logarithmic y-axis are the times.

*still* allocated. As a result, Symbiotic 4 sometimes does not insert a check even if the dereference may be invalid and thus it may miss some bugs.

In Symbiotic 5, we have fixed and significantly boosted the instrumentation part. First, we have extended the above mentioned pointer analysis such that it takes into account deallocations as well. Second, the instrumentation now works in two stages. The first stage inserts the checks where extended pointer analysis cannot guarantee the dereference safety. Moreover, compared to Symbiotic 4, we use simpler checks if possible. For example, if a pointer analysis says that a given pointer points into a known fixed-size memory block, we just insert a check that the pointer's offset is within the size of the block (without searching the tracked information about the block). The second stage inserts calls to memory tracking functions only to allocations of the memory blocks that can be accessed by some dereference instrumented in the first stage. Hence, we track only the information that may be possibly used in the checks.

To evaluate the boosted instrumentation, we run the following three configurations of Symbiotic on 393 benchmarks of the SV-COMP 2017 meta category *MemSafety* and of the category *MemSafety-TerminCrafted*:


Figure 1 clearly shows that the performance improvement brought by the extended pointer analysis itself is negligible compared to the performance improvement delivered by the extended pointer analysis in combination with staged instrumentation. For a precise description of the boosted instrumentation, experimental setup and results, we refer to [3].

Symbiotic 5 also changed the approach to error witness generation. Symbiotic 4 describes an errorneous run by a sequence of passed program locations. The sequence is often very long and it turned out to be too restrictive for witness checkers. Symbiotic 5 provides only the starting and target locations of the run and return values of some VERIFIER nondet\* calls. More precisely, we provide return values of calls in main and such that they are called just once in the run. The witnesses are now more often confirmed by witness checkers.

### **2 Software Architecture**

All components of Symbiotic are built on top of llvm 3.9 [9]. We use the clang compiler to compile the analyzed sources into llvm bitcode. Symbiotic consists of scripts written in Python that distribute work to three basic modules, all written in C++:


Before and after slicing, we optimize the code using available llvm's optimizations. The rest of bitcode transformations that we use and whose nature is mostly technical (e.g. replacement of calls inserted by clang's sanitizer to VERIFIER error calls) are implemented as llvm passes. All the components that transform bitcode take a bitcode as an input and give a valid bitcode as an output. This makes Symbiotic highly modular: any part (module) can be easily replaced or used as a stand-alone tool.

### **3 Strengths and Weaknesses**

The main strength of the approach is its universality and modularity. The instrumentation can reduce any safety property to reachability checks and therefore no special monitors need to be incorporated into the verification backend. Indeed, any tool that can decide reachability of error locations can be plugged-in.

The main disadvantage of the current configuration is that symbolic execution does not satisfactory handle programs with unbounded loops. Moreover, Klee cannot generate invariants for loops.

### **4 Tool Setup and Configuration**

	- *•* --prp=file, which sets the property specification file to use,
	- *•* --witness=file, which sets the output file for the witness,
	- *•* --32, which sets the 32-bit environment,
	- *•* --help, which shows the full list of possible options.

### **5 Software Project and Contributors**

Symbiotic 5 has been developed by M. Chalupa and M. Vitovsk´a under supervision of J. Strejˇcek. The tool and its components are available under Apache-2.0 and MIT Licenses. The project is hosted by the Faculty of Informatics, Masaryk University. llvm and Klee are also available under open-source licenses. The project web page is: https://github.com/staticafi/symbiotic.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Ultimate Automizer and the Search for Perfect Interpolants (Competition Contribution)**

Matthias Heizmann(B), Yu-Fang Chen, Daniel Dietsch, Marius Greitschus, Jochen Hoenicke, Yong Li, Alexander Nutz, Betim Musa, Christian Schilling, Tanja Schindler, and Andreas Podelski

> University of Freiburg, Freiburg im Breisgau, Germany heizmann@informatik.uni-freiburg.de

**Abstract.** Ultimate Automizer is a software verifier that generalizes proofs for traces to proofs for larger parts for the program. In recent years the portfolio of proof producers that are available to Ultimate has grown continuously. This is not only because more trace analysis algorithms have been implemented in Ultimate but also due to the continuous progress in the SMT community. In this paper we explain how Ultimate Automizer dynamically selects trace analysis algorithms and how the tool decides when proofs for traces are "good" enough for using them in the abstraction refinement.

### **1 Verification Approach**

Ultimate Automizer (in the following called Automizer) is a software verifier that is able to check safety and liveness properties. The tool implements an automata-based [6] instance of the CEGAR scheme. In each iteration, we pick a *trace* (which is a sequence of statements) that leads from the initial location to the error location and check whether the trace is *feasible* (i.e., corresponds to an execution) or *infeasible*. If the trace is feasible, we report an error to the user; otherwise we compute a sequence of predicates along the trace as a proof of the trace's infeasibility. We call such a sequence of predicates a sequence of *interpolants* since each predicate "interpolates" between the set of reachable states and the set of states from which we cannot reach the error. In the refinement step of the CEGAR loop, we try to find all traces whose infeasibility can be shown with the given predicates and subtract these traces from the set of (potentially spurious) error traces that have not yet been analyzed. We use automata to represent sets of traces; hence the subtraction is implemented as an automata operation. The major difference to a classical CEGAR-based predicate abstraction is that we never have to do any logical reasoning (e.g., SMT solver calls) that involves predicates of different CEGAR iterations.

We use this paper to explain how our tool obtains the interpolants that are used in the refinement step. The Ultimate program analysis framework provides a number of techniques to compute interpolants for an infeasible trace. We group them into the following two categories.


Recent improvements of Automizer were devoted to techniques that fall into the second category. Our basic paradigms are: (1) use different techniques to compute many sequences of interpolants, (2) evaluate the "quality" of each sequence, (3) prefer "good" sequences in the abstraction refinement.

In contrast to related work [3] we have only one measure for the quality of a sequence of interpolants: We check if the interpolants constitute a Floyd-Hoare annotation of the path program for the trace. If this is the case, we call the sequence a *perfect sequence of interpolants*. If the sequence is perfect, we use it for the abstraction refinement. If the sequence is not perfect, we only use it if no better sequence is available. Our portfolio of *trace focused techniques* is quite large for three reasons.


All our algorithms follow the same scheme: We replace all statements of the trace by skip statements. Then we incrementally check feasibility of the trace and undo replacements as long as the trace is feasible. Examples for the undo order of our algorithms are: (1) Apply the undo first to statements that occur outside of loops, follow the nesting structure of loops for further undo operations. (2) Do the very same as the first algorithm but start inside loops. (3) Apply the undo to statements with large constants later. (4) Apply the undo to statements whose SMT representation is less expensive first (e.g., postpone floating point arithmetic).

At first glance it looks like a good idea to apply different techniques to a given trace for as long as no perfect sequence of interpolants was found. This has however turned out to be a bad idea for the following reasons.


We conclude that per iteration of the CEGAR loop (resp. per trace) we only want to apply a fixed number of techniques. According to our experiments there are some techniques that are on average more successful than others; however, no technique is strictly superior to another. Hence it is neither a good idea to always apply the *n* typically most successful techniques nor to take *n* random techniques in each iteration.

We follow an approach that we call *path program-based modulation*. We have a preferred sequence in which we apply our techniques. Whenever we see a *new* trace we start at the beginning of this sequence. Whenever we see a trace that is *similar* to a trace we have already seen, we continue in the sequence of techniques at the point where we stopped for the similar trace. Our notion of similarity is: Two traces are similar if they have the same path program.

Hence we make sure that for every path program every technique is eventually applied to some trace of the path program.

### **2 Project, Setup and Configuration**

Automizer is developed on top of the open-source program analysis framework Ultimate<sup>1</sup>. Ultimate is mainly developed at the University of Freiburg and received contributions from more than 50 people. The framework and Automizer are written in Java, licensed under LGPLv3, and their source code is available on Github<sup>2</sup>.

<sup>1</sup> https://ultimate.informatik.uni-freiburg.de.

<sup>2</sup> https://github.com/ultimate-pa/ultimate.

Automizer's competition submission is available as a zip archive<sup>3</sup>. It requires a current Java installation (≥JRE 1.8) and a working Python 2.7 installation. The archive contains Linux binaries for Automizer and the required SMT solvers Z3<sup>4</sup>, CVC4<sup>5</sup>, and Mathsat<sup>6</sup>, as well as a Python script, Ultimate.py. The Python script translates command line parameters and results between Ultimate and SV-COMP conventions, and ensures that Ultimate is correctly configured to run Automizer. Automizer is invoked through Ultimate.py by calling

```
./Ultimate.py --spec prop.prp --file input.c --architecture
  32bit|64bit --full-output [--validate witness.graphml]
```
where prop.prp is the SV-COMP property file, input.c is the C file that should be analyzed, 32bit or 64bit is the architecture of the input file, and --full-output enables writing all output instead of just the status of the property to stdout. The option --validate witness.graphml is only used during witness validation and allows the specification of a file containing a violation [2] or correctness witness [1].

Depending on the status of the property, a violation or correctness witness may be written to the file witness.graphml. Automizer is not only able to generate witnesses, but also to validate them<sup>7</sup>. In any case, the complete output of Automizer is written to the file Ultimate.log.

The benchmarking tool BenchExec<sup>8</sup> contains a tool-info module that provides support for Automizer (ultimateautomizer.py). Automizer participates in all categories, which is also specified in its SV-COMP benchmark definition<sup>9</sup> file uautomizer.xml. In its role as witness validator, Automizer supports all categories except ConcurrencySafety, which is specified in the corresponding SV-COMP benchmark definition files uautomizer-validate-\*-witnesses.xml.

### **References**


<sup>3</sup> https://ultimate.informatik.uni-freiburg.de/downloads/svcomp2018/ UltimateAutomizer-linux.zip.

<sup>4</sup> https://github.com/Z3Prover/z3.

<sup>5</sup> https://cvc4.cs.nyu.edu/.

<sup>6</sup> http://mathsat.fbk.eu/.

<sup>7</sup> https://github.com/sosy-lab/sv-witnesses.

<sup>8</sup> https://github.com/sosy-lab/benchexec.

<sup>9</sup> https://github.com/sosy-lab/sv-comp.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Ultimate Taipan with Dynamic Block Encoding (Competition Contribution)**

Daniel Dietsch(B), Marius Greitschus(B), Matthias Heizmann, Jochen Hoenicke, Alexander Nutz, Andreas Podelski, Christian Schilling, and Tanja Schindler

> University of Freiburg, Freiburg im Breisgau, Germany {dietsch,greitsch}@informatik.uni-freiburg.de

**Abstract.** Ultimate Taipan is a software model checker that uses trace abstraction and abstract interpretation to prove correctness of programs. In contrast to previous versions, Ultimate Taipan now uses dynamic block encoding to obtain the best precision possible when evaluating transition formulas of large block encoded programs.

### **1 Verification Approach**

Ultimate Taipan (or Taipan for brevity) is a software model checker which combines trace abstraction [9,10] and abstract interpretation [5]. The algorithm of Taipan [8] iteratively refines an abstraction of a input program by analyzing counterexamples (cf. CEGAR [4]).

The initial abstraction of the program is an automaton with the same graph structure as the program's control flow graph, where program locations are states, transitions are labeled with program statements, and error locations are accepting. Thus, the language of the automaton consists of all traces, i.e., sequences of statements, that, if executable, lead to an error. In each iteration, the algorithm chooses a trace from the language of the current automaton and constructs a path program from it. A path program is a projection of the (abstraction of the) program to the trace. The algorithm then uses abstract interpretation to compute fixpoints for the path program. If the fixpoints of the path program are sufficient to prove correctness, i.e., the error location is unreachable, at least the chosen trace and all other traces that are covered by the path program are infeasible. The computed fixpoints constitute a proof of correctness for the path program and can be represented as a set of state assertions. From this set of state assertions, the abstraction is refined by constructing a new automaton whose language only consists of infeasible traces and then subtracting it from the current abstraction using an automatatheoretic difference operation. If abstract interpretation was unable to prove correctness of the path program, the algorithm obtains a proof of infeasibility of the trace using either interpolating SMT solvers or a combination of unsatisfiable cores and strongest post or weakest pre [6]. If the currently analyzed trace is feasible,

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 452–456, 2018. https://doi.org/10.1007/978-3-319-89963-3\_31

**(a)** Program in Boogie.

**(b)** No block encoding.

**(c)** Large block encoding.

**Fig. 1.** Example program.

the trace represents a program execution that can reach the error. If the current automaton becomes empty after a difference operation, all potential error traces have been proven to be infeasible.

**Dynamic Block Encoding.** Large block encoding [1] is a technique to reduce the number of locations in a control flow graph. As Taipan relies on trace abstraction, the number of locations determines the performance of the automata operations, which impact the overall performance significantly. It is therefore beneficial to use a strong block encoding that removes as many locations as possible. Unfortunately, the resulting transitions can lead to a loss of precision during the application of an abstract post operator. Consider the example program and its control flow graph with different block encodings shown in Fig. 1. Each control flow graph consists of a set of program locations *LOC* , an initial location (-<sup>3</sup> in Fig. 1), a set of error locations ({-<sup>6</sup>} in Fig. 1), and a transition relation → ⊆ *LOC* × *TF* × *LOC* which defines the transitions between the locations and labels each transition with a *transition formula* from the set of transition formulas *TF*. Transition formulas encode the semantics of the program as first-order logic formulas over various SMT theories. In Ultimate, a transition formula ψ is a tuple (ϕ,*IN* , *OUT*, *AUX* , *pv*) where ϕ is a closed formula over the three disjoined sets of input (*IN* ), output (*OUT*), and auxiliary (*AUX* ) variables, and *pv* : *IN* ∪ *OUT* → V is an injective function that maps variables occurring in ϕ to program variables. We write output variables as primed variables and input variables as unprimed variables.

Taipan computes a fixpoint for each location of a control flow graph by (repeatedly) applying an abstract post operator *post*# to these transition formulas. To this end, an abstract domain <sup>D</sup> = (A, α, γ,,, <sup>∇</sup>, *post*#) is used, where <sup>A</sup> is a complete lattice representing all possible abstract states containing the designated abstract states and ⊥, α is an abstraction function, γ is a concretization function, is a join operator, is a meet operator, ∇ is a widening operator, and *post*# : <sup>A</sup>×*TF* <sup>→</sup> <sup>A</sup> is an abstract transformer which computes an abstract post state σ from a given abstract pre-state σ and a transition formula ψ. Taipan uses a combination of octagons [11] and sets of divisibility congruences [7] as abstract domain, but for brevity we explain the example using intervals.

In rows 1 to 3 of Table 1, we apply *post*# of the interval domain in sequence to each of the transition formulas from Fig. 1b. In rows 4a and 4b we apply


**Table 1.** Application of *post*# for transition formulas from Fig. 1.

the same operator to the only transition formula of Fig. 1c, but process the conjunction in different orders. Although the logical ∧-operator is commutative, the result differs. This is due to different ways of computing the abstract post state. We can express *post*#(σ, A∧B) = <sup>σ</sup> either as *post*#(σ, A)*post*#(σ, B), as *post*#(*post*#(σ, A), B), or as *post*#(*post*#(σ, B), A). The interval domain cannot express the equality relation between two variables (i.e., the conjunct b = a ), therefore, the first way will compute *post*#({<sup>a</sup> : , b : }, b <sup>=</sup> <sup>a</sup> ) = {a : , b : }, effectively rendering the constraint useless. The second and third way may succeed, depending on the ordering of conjuncts. In general, the ordering is important, but in our example, it does not matter as long as b = a is not first.

In Taipan, we solve this problem by introducing the notion of *expressibility* to an abstract domain. We augment each abstract domain with an expressibility predicate *ex* which decides for each non-logical symbol of a transition formula (i.e., each relation, function application, variable, and constant) whether it can be represented in the domain. For example, the interval domain can represent all relations that contain at most one variable, while octagons can represent all relations of the form <sup>±</sup><sup>x</sup> <sup>±</sup> <sup>y</sup> <sup>≤</sup> <sup>c</sup>. We then apply *post*# on conjuncts of a transition formula in an order induced by *ex* , thus effectively choosing a new *dynamic* block encoding. For *post*#(σ, ϕ), our algorithm computes σ by first converting the formula ϕ to DNF s.t. ϕ = ϕ<sup>0</sup> ∨ ϕ<sup>1</sup> ∨ ... ∨ ϕn. For each disjunct ϕ<sup>i</sup> = ϕ<sup>0</sup> <sup>i</sup> <sup>∧</sup> <sup>ϕ</sup><sup>1</sup> <sup>i</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>ϕ</sup><sup>m</sup> <sup>i</sup> , we compute *post*#(σ, ϕi) = σ <sup>i</sup> as follows:


The result for *post*#(σ, ψ) is then n <sup>i</sup>=0 σ <sup>i</sup> = σ .

### **2 Project, Setup and Configuration**

Taipan is a part of the open-soure program analysis framework Ultimate<sup>1</sup>, written in Java, licensed under LGPLv3<sup>2</sup>, and open source<sup>3</sup>. The Taipan competition submission is available as a zip archive<sup>4</sup>. It requires a current Java installation (≥JRE 1.8) and a working Python 2.7 installation. The submission contains an executable version of Taipan for Linux platforms, the binaries of the required SMT solvers Z3<sup>5</sup>, CVC4<sup>6</sup>, and Mathsat<sup>7</sup>, as well as a Python script, Ultimate.py, which maps the SV-COMP interface to Ultimate's command line interface and selects the correct settings and the correct toolchain. In SV-COMP, Taipan is invoked through Ultimate.py with

```
./Ultimate.py --spec prop.prp --file input.c --architecture
                 32bit|64bit --full-output
```
where prop.prp is the SV-COMP property file, input.c is the C file that should be analyzed, 32bit or 64bit is the architecture of the input file, and --full-output enables writing all output instead of just the status of the property to stdout. The complete output of Taipan is also written to the file Ultimate.log. Depending on the status of the property, a violation [3] or correctness [2] witness may be written to the file witness.graphml.

The benchmarking tool BenchExec<sup>8</sup> supports Taipan through the tool-info module ultimatetaipan.py. Taipan participates in all categories, as specified by its SV-COMP benchmark definition<sup>9</sup> file utaipan.xml.

#### **References**


<sup>1</sup> https://ultimate.informatik.uni-freiburg.de.

<sup>2</sup> https://www.gnu.org/licenses/lgpl-3.0.en.html.

<sup>3</sup> https://github.com/ultimate-pa/ultimate/.

<sup>4</sup> https://ultimate.informatik.uni-freiburg.de/downloads/svcomp2018/UltimateTaipan-linux.zip.

<sup>5</sup> https://github.com/Z3Prover/z3.

<sup>6</sup> https://cvc4.cs.nyu.edu/.

<sup>7</sup> http://mathsat.fbk.eu/.

<sup>8</sup> https://github.com/sosy-lab/benchexec.

<sup>9</sup> https://github.com/sosy-lab/sv-comp.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **VeriAbs: Verification by Abstraction and Test Generation (Competition Contribution)**

Priyanka Darke(B), Sumanth Prabhu, Bharti Chimdyalwar, Avriti Chauhan, Shrawan Kumar, Animesh Basakchowdhury, R. Venkatesh, Advaita Datar, and Raveendra Kumar Medicherla

Tata Research Development and Design Centre, Pune, India *{*priyanka.darke,sumanth.prabhu,bharti.c,avriti.chauhan,shrawan.kumar, a.basakchowdhury,r.venky,advaita.datar,raveendra.kumar*}*@tcs.com

**Abstract.** VeriAbs is a portfolio software verifier for ANSI-C programs. To prove properties with better efficiency and scalability, this version implements output abstraction with *k*-induction in the presence of resets. VeriAbs now generates post conditions over the abstraction to find invariants by applying Z3's tactics of quantifier elimination. These invariants are then used to generate validation witnesses. To find errors in the absence of known program bounds, VeriAbs searches for property violating inputs by applying random test generation with fuzz testing for a better scalability as compared to bounded model checking.

### **1 Verification Approach**

**Background.** VeriAbs has implemented abstract acceleration [5] and *k*induction techniques to scale Bounded Model Checking (BMC) for programs with loops of large or unknown bounds. VeriAbs abstracts such loops to loops of known small bounds, which can be proved by BMC. This abstraction is achieved by accelerating selected variables processed inside loops. Further, VeriAbs applies incremental *k*-induction to improve precision. Loops processing arrays of large and unknown sizes are substituted by abstract loops that execute a small nondeterministically chosen sequence of original loop iterations. The idea is based on the concept of *loop shrinkability* [10].

### **1.1 Tool Enhancements**

For SV-COMP 2018, VeriAbs has been supplemented with an efficient implementation of output abstraction to prove properties, random test generation with fuzzing to find errors, and witness generation.

**Output Abstraction.** The SV-COMP 2017 version of VeriAbs cannot precisely validate programs with loops in which all variables are modified with non-linear

P. Darke—Jury member.

c The Author(s) 2018

D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 457–462, 2018. https://doi.org/10.1007/978-3-319-89963-3\_32

arithmetic expressions or resets. For such programs, the current version applies an improved output abstraction [13] that simply replaces the corresponding loop with non-deterministic assignments to all the modified variables.

**Search for Property Violating Inputs.** In order to alleviate the lack of abstraction refinement, VeriAbs adopts an approach to search for a property violating input. To this end, it uses *fuzz testing* to search for the input that reaches the error location. Fuzz testing is a testing technique that aims to uncover run-time errors by executing the target program with a large number of inputs generated automatically and systematically. Grey-box fuzzing [3] is a fuzz testing technique that uses a light weight instrumentation to observe the target program behavior on a test run. It uses this information to generate new test inputs that might exhibit new program behaviors. VeriAbs uses American Fuzzy Lop (AFLfuzz) [12] as the fuzz testing tool.

**Witness Generation.** The previous version of VeriAbs used CPAchecker [2] to generate validation witnesses from abstract programs. The SV-COMP 2018 version has implemented techniques for generation of both correctness and error witnesses. If VeriAbs concludes safety of the input program, it generates the correctness witness with loop invariants. These invariants are generated by computing the strongest postcondition equation using methods presented in [8], except for loops where the loop acceleration information is used instead. These invariants can have quantifiers and non-program variables. However, SV-COMP 2017 witness validators recognize only those invariants that are expressed as C expressions in program variables. VeriAbs uses Z3 [6] to eliminate quantifiers and nonprogram variables from the invariants. These invariants are added to the control flow automaton generated by CPAchecker to generate the validation witness.

The error witness generation technique is decided based on the strategy that was used to falsify the input program. When VeriAbs decides that the input program is unsafe by fuzz testing (i.e., using AFL-fuzz [12]), it generates a violation witness with a valuation of variables at the program points that assign non-deterministic values to program variables. This is achieved by replaying the execution that caused the property violation on an instrumented input program. This instrumented program prints the aforementioned valuation. In order to avoid file latency this instrumented program is only used to replay error execution. The values of variables thus obtained are used to generate error witness. On the other hand, if input program was decided to be unsafe by using BMC, then corresponding error witness is used.

**Array Loop Abstraction.** We abstract loops that process arrays of large or unknown sizes having quantified property, using the method based on the idea of *loop shrinkability* [10]. We call an array processing loop as *k-shrinkable* when the original program is guaranteed to be correct if execution of every sequence of *k* iterations of the original loop results in property, which is projected to the chosen sequence, being satisfied. A *k*-shrinkable loop, is replaced with an abstract loop that executes the non-deterministically chosen sequence of *k* iterations of the original loop and the property is also translated to be checked over array elements corresponding to the chosen sequence of iterations only. The *k*-shrinkability criterion ensures that if the program is incorrect then the translated property will get violated for some sequence of *k* iterations, in the abstract program.

### **2 Verification Process and Software Architecture**

The verification process of VeriAbs is shown in Fig. 1. VeriAbs passes the input C file to a Tata Consultancy Services (TCS) [1] in-house C front end to generate the intermediate representation (IR) of the program. It then analyzes this IR using PRISM, a TCS in-house program analysis framework [9] to perform the abstractions and instrumentation. It uses C Bounded Model Checker (CBMC) [4] version 5.8 with MiniSat [7] to validate the abstraction or the original program of known bounds. VeriAbs generates correctness witnesses by computing loop invariants using strongest-postcondition. It uses Z3 version 4.5.1 to eliminate quantifiers as SV-COMP requires invariants to be expressed as C expressions. These simplified invariants are added to the control flow automaton generated by CPAchecker version 1.6.1 [2]. VeriAbs uses CBMC version 5.8 for generating error witnesses. For fuzz testing, VeriAbs uses AFL-fuzz [12] version 2.35b. It invokes CBMC and AFL-fuzz sequentially, for program falsification.

**Fig. 1.** The verification process of VeriAbs - enhancements are highlighted

The SV-COMP 2018 version of VeriAbs first analyzes every loop to check if it contains some linear modifications to numerical variables so that they can be precisely validated by Loop Abstraction for BMC (LABMC) [5]. If this check passes, it applies a range analysis [11] to identify ranges of those variables. On the other hand, when all variables are non-linearly modified a simpler output abstraction is applied. If the loop reads or modifies arrays, then it applies array loop abstraction as explained in Sect. 1, and then applies BMC to validate the abstraction. To find errors, VeriAbs uses the new program instrumentation for violation witness generation and grey-box fuzzing with AFL to generate witnesses for such programs.

### **3 Strengths and Weaknesses**

The main strength of VeriAbs is that it is sound. All transformations implemented by the tool are over-approximations. In case of CBMC, the tool provides an option (unwinding-assertions) which ensures sufficient unwinding for proving the property. Hence if the tool reports that a property holds then it indeed holds. Another key strength is that it transforms all loops in a program to abstract loops with a known finite number of iterations, enabling the use of bounded model checkers for property proving. The main weakness of the tool is that it does not implement a refinement process that is well suited to find errors. But it can find errors using fuzz testing and bounded model checking. VeriAbs is dependent on Z3 for quantifier and non-program variable elimination from correctness witness invariants, and it is dependent on CPAchecker for generating program automata. As compared to the results of SV-COMP 2017 version, VeriAbs performed significantly better in Arrays, Loops, ECA, Sequentialized and Recursive sub categories this year.

### **4 Tool Setup and Configuration**

The VeriAbs SV-COMP 2018 executable is available for download at the URL http://www.cmi.ac.in/∼madhukar/veriabs/VeriAbs.zip. To install the tool, download the archive, extract its contents, and then follow the installation instructions in VeriAbs/INSTALL.txt. To execute VeriAbs, the user needs to specify the property file of the respective verification category using the --property-file option. The witness is generated in the current working directory as witness.graphml. A sample command is as follows: VeriAbs/scripts/veriabs --property-file ALL.prp example.c

VeriAbs is participating in the ReachSafety category. The BenchExec wrapper script for the tool is veriabs.py and veriabs.xml is the benchmark description file.

### **5 Software Project and Contributors**

VeriAbs is a verification tool maintained by TCS Research [1], and parts of it have been developed by the authors, Mohammad Afzal and other members of this organization. We would like to thank Charles Babu M and other interns who have contributed to the development of VeriAbs.

### **References**


462 P. Darke et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Ábrahám, Erika II-287 Andrianov, Pavel II-427 Aronis, Stavros II-229 Baarir, Souheib I-99 Backes, John II-176 Balasubramanian, A. R. II-38 Bansal, Kshitij I-115 Barbosa, Haniel II-112 Barreto, Raimundo II-437 Barrett, Clark II-55 Basakchowdhury, Animesh II-457 Basin, David I-344 Bastian, Robert I-270 Becker, Heiko I-270 Bertrand, Nathalie II-38 Biere, Armin II-75 Bodei, Chiara I-344 Bodík, Rastislav I-251 Bortolussi, Luca II-396 Brázdil, Tomáš I-385 Bryant, Randal E. I-81 Budde, Carlos E. II-340 Cauderlier, Raphaël I-172 Češka, Milan II-155 Chalupa, Marek II-442 Champion, Adrien I-365 Chatterjee, Krishnendu I-385 Chauhan, Avriti II-457 Chen, Yu-Fang II-447 Chiba, Tomoya I-365 Chimdyalwar, Bharti II-457 Chini, Peter II-20 Ciardo, Gianfranco I-328 Colange, Maximilien I-99 Conchon, Sylvain II-132 Cordeiro, Lucas II-437 Costa, Gabriele I-344

D'Argenio, Pedro R. II-340 Darke, Priyanka II-457 Darulova, Eva I-270 Datar, Advaita II-457

Degano, Pierpaolo I-344 Deifel, Hans-Peter II-361 Dietsch, Daniel II-447, II-452 Dong, Wei II-422 Dragomir, Iulia II-201 Duan, Zhao II-432 Duan, Zhenhua II-432 Dureja, Rohit I-309 Dwyer, Matthew B. II-249

Esparza, Javier II-3

Fedyukovich, Grigory I-251, II-176 Ferrère, Thomas II-303 Finkbeiner, Bernd II-194 Fontaine, Pascal II-112

Gacek, Andrew II-176 Galletta, Letterio I-344 Garg, Pranav I-232 Greitschus, Marius II-447, II-452 Guo, Huajun II-176 Guo, Shu-yu II-55 Gurfinkel, Arie II-176

Hahn, Christopher II-194 Hartmanns, Arnd II-320, II-340 Haslbeck, Maximilian P. L. I-155 Hausmann, Daniel II-361 Havlena, Vojtěch II-155 Heizmann, Matthias II-266, II-447, II-452 Heule, Marijn J. H. II-75 Hoenicke, Jochen II-447, II-452 Holík, Lukáš II-155 Huang, Xiaowei I-408

Iguernlala, Mohamed II-132 Iosif, Radu II-93 Izycheva, Anastasiia I-270

Jiang, Chuan I-328 Jonsson, Bengt II-229 Junges, Sebastian II-320 Katis, Andreas II-176 Katoen, Joost-Pieter II-320 Kim, Seonmo I-133 Kobayashi, Naoki I-365 Kordon, Fabrice I-99 Koskinen, Eric I-115 Křetínský, Jan I-385 Kumar, Shrawan I-213, II-457 Kwiatkowska, Marta I-408 Lammich, Peter I-61 Lång, Magnus II-229 Le, Quang Loc I-41 Lebeltel, Olivier II-303 Leike, Jan II-266 Lengál, Ondřej II-155 Li, Yong II-447 Li, Yunchou II-422 Liu, Wanwei II-422 Madhusudan, P. I-232 Maler, Oded II-303 Malík, Viktor II-417 Mandrykin, Mikhail II-427 Markey, Nicolas II-38 Marsso, Lina II-211 Martiček, Štefan II-417 Mateescu, Radu II-211 Mattarei, Cristian II-55 McCamant, Stephen I-133 Medicherla, Raveendra Kumar II-457 Menezes, Rafael II-437 Metin, Hakan I-99 Meyer, Philipp J. II-3 Meyer, Roland II-20 Müller, Peter I-190 Musa, Betim II-447 Mutilin, Vadim II-427 Namjoshi, Kedar S. II-379 Nasir, Fariha I-270 Neider, Daniel I-232 Nelson, Bradley II-55 Ničković, Dejan II-303 Nipkow, Tobias I-155 Nutz, Alexander II-447, II-452 Ong, C.-H. Luke II-432 Park, Daejun I-232 Podelski, Andreas II-447, II-452

Prabhu, Sumanth II-457 Preoteasa, Viorel II-201 Qin, Shengchao I-41 Quatmann, Tim II-320 Reger, Giles I-3 Reynolds, Andrew II-112 Ritter, Fabian I-270 Rocha, Herbert II-437 Roux, Pierre II-132 Rozier, Kristin Yvonne I-309 Sagonas, Konstantinos II-229 Saha, Shambwaditya I-232

Saivasan, Prakash II-20 Sanyal, Amitabha I-213 Sato, Ryosuke I-365 Schilling, Christian II-447, II-452 Schindler, Tanja II-447, II-452 Schrammel, Peter II-417 Schröder, Lutz II-361 Schupp, Stefan II-287 Sedwards, Sean II-340 Serwe, Wendelin II-211 Shah, Punit I-213 Sherman, Elena II-249 Sighireanu, Mihaela I-172 Silvetti, Simone II-396 Smith, Ben II-55 Srivas, Mandayam II-417 Stenger, Marvin II-194 Strejček, Jan II-442 Suda, Martin I-3 Summers, Alexander J. I-190 Sun, Jun I-41

Tentrup, Leander II-194 Tian, Cong II-432 Toman, Viktor I-385 Trefler, Richard J. II-379 Tripakis, Stavros II-201 Tripp, Omer I-115

Ulus, Dogan II-303

van Dijk, Tom I-291 Vasilyev, Anton II-427 Venkatesh, R. I-213, II-457 Vitovská, Martina II-442 Vojnar, Tomáš II-155, II-417 Völzer, Hagen II-3 Voronkov, Andrei I-3

Wahlang, Johanan II-417 Wang, Ji II-422 Whalen, Michael W. II-176 Wicker, Matthew I-408 Wimmer, Simon I-61 Xu, Xiao II-93 Yin, Liangze II-422 Zhan, Bohua I-23